Resolve Co-reference - Githubissues

Tacacs-1101 commented 2 years ago

Hi,

I am unable to figure out how to resolve coreference from your span clusters id. Please help me to understand word_cluster_ids and span_cluster_ids and how to resolve co-reference using it.

vdobrovolskii commented 2 years ago

Hi!

Can you elaborate? What exactly are you doing and what do you hope to achieve?

Tacacs-1101 commented 2 years ago

Sure. Eg: Rahul voted for Trump because he was most aligned with his values. Here he can be resolved to Trump and his can be replaced with Rahul. When I pass this sentence to your model. It gives word cluster IDs and span cluster IDs. Which IDs should be used to resolve coreference in this example and how ?

vdobrovolskii commented 2 years ago

Oh, I see. The resulting span_clusters is a list of clusters, where each cluster is a lists of spans, where each span is a pair of word indices. For instance, in your example "Rahul" will be [0, 1], "Trump" will be [3, 4], "he" will be [5, 6] and "his" will be [10, 11].

So the correct prediction should look like this:

[[(0, 1), (10, 11)], [(3, 4), (5, 6)]]

You should be able to turn it into strings like this:

Span = Tuple[int, int]
SpanCluster = List[Span]

def print_clusters(clusters: List[SpanCluster], words: List[str]):
    for cluster in clusters:
        for start, end in cluster:
            print(f"{start, end} {' '.join(words[start:end])}")
        print()

This should produce the following result:

(0, 1) Rahul
(10, 11) his

(3, 4) Trump
(5, 6) he

Tacacs-1101 commented 2 years ago

Thanks @ vdobrovolskii, it really helped.

vdobrovolskii / wl-coref

Resolve Co-reference #31