kentonl / e2e-coref

End-to-end Neural Coreference Resolution
Apache License 2.0
518 stars 174 forks source link

Converting co-reference chains from conlll format to jsonlines format of clusters. #32

Closed ashim95 closed 5 years ago

ashim95 commented 5 years ago

First of all, thank you for sharing your code.

I am unable to understand how the co-reference chains from conlll files have been converted to the cluster format (the numbers in the clusters seem arbitrary !!).

Thanks,

kentonl commented 5 years ago

The script that converts the conll files to the jsonlines format is here: https://github.com/kentonl/e2e-coref/blob/master/minimize.py. Could you be a bit more specific about what is confusing?

If you're asking about the cluster numbers in the conll format, then yes they are arbitrary. If you're asking about the numbers in the jsonline format, they are referring to start and end indices of each span within a cluster.

ashim95 commented 5 years ago

Thank You very much for your swift reply.

I was, in fact talking about the cluster numbers in jsonlines format. Now that you mentioned they are start and end of the spans, it all makes sense.

Again, thank you very much.