Info about data output - Githubissues

timjong93 commented 6 years ago

Is there any information available about the format of the output from running a model on custom data? I'm not completely sure how to interpet it.

Thanks in advance.

vene commented 6 years ago

Not in any nicely documented format, sorry. But I can try to explain a bit:

When you call clf.predict, as for instance on this line, you get a list of the same length as the number of UserDocs passed at the input.

Each element of that list is a tuple of two items: (prop_posteriors, link_posteriors). The proposition posteriors encode the classified type of each proposition. The link posteriors show, for every possible link in the document, whether that support relation is predicted as true or false.

So for instance, for document i, you could iterate jointly over:

for (src, trg), prediction in zip(doc.link_to_prop, link_posteriors[i]):
    ...

And to go from proposition type posteriors to actual labels, you could use the label encoder that is part of the model: self.prop_encoder_.inverse_transform(prop_posteriors)

Hope this helps!

anikethjr commented 6 years ago

The output of experiments.predict_pretrained is a list of DocLabels (one for each document). Could you please explain the meaning of the links. I understand that every sentence/proposition is represented by a node and the links connect these nodes. But the ordering of these links is not apparent. Could you please clarify the ordering of these links, their directions, etc.

Thank you :)

vene commented 6 years ago

look at the doc.link_to_prop structure, it should answer your question.

In particular, if doc.link_to_prop[i] = [a, b], it means that the ith link goes from a to b.

vene / marseille

Info about data output #6