Closed hankcs closed 4 years ago
Hi,
Just checked the data on our server, seems there are 4 predicates on my side. The extraction script is actually slightly modified from Luheng He's paper (here) which outputs
4 So , this will have surpassed what it is now for Japan and China which still failed to reach 200 billion US dollars at this 33rd anniversary of the normalization of their diplomatic relations . ||| O O O O B-V O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ||| 01
8 So , this will have surpassed what it is now for Japan and China which still failed to reach 200 billion US dollars at this 33rd anniversary of the normalization of their diplomatic relations . ||| O O O O O O B-ARG2 B-ARG1 B-V B-ARGM-TMP B-ARGM-ADV I-ARGM-ADV I-ARGM-ADV I-ARGM-ADV O O O O O O O O O O O O O O O O O O O O O ||| 01
16 So , this will have surpassed what it is now for Japan and China which still failed to reach 200 billion US dollars at this 33rd anniversary of the normalization of their diplomatic relations . ||| O O O O O O O O O O O B-ARG1 I-ARG1 I-ARG1 B-R-ARG1 B-ARGM-TMP B-V B-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 I-ARG2 O ||| 01
18 So , this will have surpassed what it is now for Japan and China which still failed to reach 200 billion US dollars at this 33rd anniversary of the normalization of their diplomatic relations . ||| O O O O O O O O O O O B-ARG0 I-ARG0 I-ARG0 B-R-ARG0 O O O B-V B-ARG1 I-ARG1 I-ARG1 I-ARG1 B-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP I-ARGM-TMP O ||| 01
Gonna look into it further and get back to you.
Another difference to note (btw your dump and my dump) is that, the lines are between 28311 and 28314 on my side.
Checking the line 6850 at the cctv_0002 data file, the verb surpassed
was not tagged as a (V*)
though. If it's not tagged as a (V*)
, the extraction script won't take is as a predicate. This is getting interesting.
bc/cctv/00/cctv_0002 18 0 So RB (TOP(S(ADVP*) - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 1 , , * - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 2 this DT (NP*) - - - Liu_jiangyong * * * * * (49)
bc/cctv/00/cctv_0002 18 3 will MD (VP* - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 4 have VB (VP* have 01 - Liu_jiangyong * (V*) * * * -
bc/cctv/00/cctv_0002 18 5 surpassed VBN (VP* - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 6 what WP (SBAR(WHNP(WHNP*)) - - - Liu_jiangyong * * (ARG2*) * * -
bc/cctv/00/cctv_0002 18 7 it PRP (S(NP*) - - - Liu_jiangyong * * (ARG1*) * * -
bc/cctv/00/cctv_0002 18 8 is VBZ (VP* be 01 1 Liu_jiangyong * * (V*) * * -
bc/cctv/00/cctv_0002 18 9 now RB (ADVP*) - - - Liu_jiangyong * * (ARGM-TMP*) * * -
bc/cctv/00/cctv_0002 18 10 for IN (PP* - - - Liu_jiangyong * * (ARGM-ADV* * * -
bc/cctv/00/cctv_0002 18 11 Japan NNP (NP* - - - Liu_jiangyong (GPE) * * (ARG1* (ARG0* (53|(33)
bc/cctv/00/cctv_0002 18 12 and CC * - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 13 China NNP *)) - - - Liu_jiangyong (GPE) * *) *) *) (44)|53)
bc/cctv/00/cctv_0002 18 14 which WDT (SBAR(WHNP*) - - - Liu_jiangyong * * * (R-ARG1*) (R-ARG0*) -
bc/cctv/00/cctv_0002 18 15 still RB (S(VP(ADVP*) - - - Liu_jiangyong * * * (ARGM-TMP*) * -
bc/cctv/00/cctv_0002 18 16 failed VBD * fail 01 1 Liu_jiangyong * * * (V*) * -
bc/cctv/00/cctv_0002 18 17 to TO (S(VP* - - - Liu_jiangyong * * * (ARG2* * -
bc/cctv/00/cctv_0002 18 18 reach VB (VP* reach 01 1 Liu_jiangyong * * * * (V*) -
bc/cctv/00/cctv_0002 18 19 200 CD (NP(QP* - - - Liu_jiangyong (MONEY* * * * (ARG1* -
bc/cctv/00/cctv_0002 18 20 billion CD *) - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 21 US NNP * - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 22 dollars NNS *) - - - Liu_jiangyong *) * * * *) -
bc/cctv/00/cctv_0002 18 23 at IN (PP* - - - Liu_jiangyong * * * * (ARGM-TMP* -
bc/cctv/00/cctv_0002 18 24 this DT (NP(NP* - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 25 33rd JJ * - - - Liu_jiangyong (ORDINAL) * * * * -
bc/cctv/00/cctv_0002 18 26 anniversary NN *) - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 27 of IN (PP* - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 28 the DT (NP(NP* - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 29 normalization NN *) - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 30 of IN (PP* - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 31 their PRP$ (NP* - - - Liu_jiangyong * * * * * (53)
bc/cctv/00/cctv_0002 18 32 diplomatic JJ * - - - Liu_jiangyong * * * * * -
bc/cctv/00/cctv_0002 18 33 relations NNS *)))))))))))))))))) - - - Liu_jiangyong * * * *) *) -
bc/cctv/00/cctv_0002 18 34 . . *)) - - - Liu_jiangyong * * * * * -
So the current observation is that the gold_conll annotation in your post is different from the one I got (and the one in Liu_jiangyong's (or yuchenlin's) repo)... Not sure what caused this. Any idea?
My apologies. The one has surpassed
labeled as (V*)
is from cctv_0002.v4_gold_conll
and yours is not. I was actually using different versions so this issue is not valid.
Hi, thank you for your great paper.
I'm experimenting with your codes and comparing your preprocessing against a SOTA paper. I'm using the same dataset. What surprised me is that your preprocessed result differs from theirs. For example, for the following conll12 sentence:
Your preprocessed result consists of 4 predicate-arg pairs (conll2012.train.txt, line 28617 to 28620):
But theirs produce 5:
Also, the conll sentence has 5 N:ARGS columns too. Seems that the second predicate
B-ARGM-DIS O B-ARG0 B-ARGM-MOD O B-V B-ARG1 I-ARG1 I-ARG1 I-ARG1 I-ARG1 I-ARG1 I-ARG1 I-ARG1 B-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 I-C-ARG1 O
is missing in your preprocessed result.I'm new to this task and don't know much about it. Could you clarify this issue? Thank you.