lavis-nlp / jerex

PyTorch code for JEREX: Joint Entity-Level Relation Extractor
MIT License
63 stars 15 forks source link

Help in interpreting the test results #5

Closed raphael10-collab closed 3 years ago

raphael10-collab commented 3 years ago

Yesterday, after training the model ( https://github.com/lavis-nlp/jerex/issues/3#issuecomment-852361468 ) I tested it. Now I ask you to give me an help in interpreting the results

I guess that the predictions made during the test by the model, which was previously trained, have been stored yesterday in predictions.json file. Is this right?

(base) raphy@pc:~/jerex/data/runs/2021-06-01/21-32-32$ ls -lah
total 122M
drwxrwxr-x 5 raphy raphy 4,0K giu  2 08:59 .
drwxrwxr-x 3 raphy raphy 4,0K giu  1 21:32 ..
drwxrwxr-x 3 raphy raphy 4,0K giu  1 21:32 cv
-rw-rw-r-- 1 raphy raphy 120M giu  1 21:42 examples_test.html
-rw-rw-r-- 1 raphy raphy    0 giu  1 21:32 jerex_test.log
-rw-rw-r-- 1 raphy raphy 2,2M giu  1 21:42 predictions.json
drwxrwxr-x 2 raphy raphy 4,0K giu  1 21:32 run_config
drwxrwxr-x 3 raphy raphy 4,0K giu  1 21:32 tb
(base) raphy@pc:~/jerex/data/runs/2021-06-01/21-32-32$ 

I loaded predictions.json file in https://jsonformatter.org/json-viewer to better view and understand it.

Example:

relation number 0: 
    head: 0   ----------->   entities: 0 
                                          cluster: 0                -----------------> 0:13
                                                                                                1: 17

                                                                                                Does it mean from token 13 to token 17 ?
                                                                                                 tokens: 
                                                                                                     13: -
                                                                                                     14: platform
                                                                                                     15: producer
                                                                                                     16: and
                                                                                                     17: director

                                          type: MISC

    tail: 15     -------------> entities: 15
                                           cluster: 15              -----------------> 0:12

                                                                                                 Does it mean token 12 ?
                                                                                                    tokens:
                                                                                                        12: cross
                                           type: TIME
    type: P577

Does it mean that it found a relation of type TIME between "platform producer and director" and "cross"?

markus-eberts commented 3 years ago

Yes, 'predictions.json' contains the model's predictions (there is also a visualization of predictions in 'examples_test.html' - just open it with your web browser).

The indices you interpreted as token indices are actually indices of mentions in the "mentions" list. Each entry in the mention list contains the start token index (inclusive) and end token index (exclusive) of that mention. So cluster 0 (=entity) contains mentions 13 and 17 (which is both "Chasing Madoff") and cluster (=entity) 15 contains mention 12 (which is "2010"). Entity 0 is classified as MISC (its a movie) and entity 15 is classified as TIME. You need to have a look at the "relations" list for detected relations between entities. In your example, the relation extracted for the two entities is "P577", which is "publication date" (-> https://www.wikidata.org/wiki/Property:P577)

I also just pushed an update to the json structure, since it is admittedly a bit hard to interpret. I collapsed clusters and entities into a single list (named 'entities) and added 'start'+'end' keys to mention entries. You can pull the new code and rerun the test procedure to obtain the updated 'predictions.json' file.

raphael10-collab commented 3 years ago

Ok. Now I understand. Thank you!

relation number 0: 
    head: 0   ----------->   entities: 0 
                                          type: MISC
                                          cluster: 0                -----------------> mentions:
                                                                                                    0:13         --------> tokens:
                                                                                                                                     0: 76    --->  Chasing
                                                                                                                                     1: 78    --->  Madoff

                                                                                                    1: 17        --------> tokens:
                                                                                                                                     0: 94  --->    Chasing
                                                                                                                                     1: 96  --->    Madoff

    tail: 15     -------------> entities: 15
                                           type: TIME
                                           cluster: 15              -----------------> mentions:
                                                                                                    0:12        ---------> tokens:
                                                                                                                                     0: 80    ---> 2010
                                                                                                                                     1: 81    ---> feature

    type of relation: P577 : https://www.wikidata.org/wiki/Property:P577