vered1986 / OKR

OKR: A Consolidated Open Knowledge Representation for Multiple Texts
Other
39 stars 13 forks source link

Split entities to nouns? #21

Open gabrielStanovsky opened 7 years ago

gabrielStanovsky commented 7 years ago

@kleinay Shany mentioned that we get lower scores on entity recognition - probably since we group noun compounds. I think we should:

  1. Get her evaluation code somewhere on github (probably first to her fork and then PR to Vered's). BTW, I don't think she's a member of the OKR project.
  2. Run the evaluations to get the lower numbers on entity identification
  3. Decide what kind of relation we want between head of noun compounds and their dependent words, see this dependency parse, for example
  4. Re-run the evaluations and see if the numbers improve.
gabrielStanovsky commented 7 years ago

Compare with PropS output for the same sentence

kleinay commented 7 years ago

@rachelvov also ran evaluation on V2 conversions, also got low entity mention extraction evaluation (~0.3). I think we should try to split noun compound as much as possible. @gabrielStanovsky , is it a significant change of the props_wrapper? we should change and run evaluaion again as you suggested. @shanybar 's evaluation code is merged already I think.

rachelvov commented 7 years ago

@gabrielStanovsky - how long do you estimate that the process of fixing this will take? I need a decent V2 baseline for my thesis, and currently with node mention score of 0.3 it's really not good enough. I'm trying to decide whether to wait for the fix in the pipeline or to change the evaluation to more flexible (take partial match into account and etc). Also please let me know if there's any way I can help with this.

gabrielStanovsky commented 7 years ago

The problem is that I'm not sure what's the behaviour we want here. For example, for the sentence "The summer school board council announced the new dates", we currently get this PropS parse Which has this long entity mention summer school board council, as opposed to OKR which I assume will break it into 3 entities?

@kleinay and @rachelvov, what do you think the resulting PropSWrapper should look like?

I guess I can restore the dependency version, but I don't think that this would be very helpful, and will not improve the metrics according to V1 (but maybe in V2?)

@rachelvov, I think you can write in your thesis that a simple "noun baseline" achieves good results, but in general it deteriorates the performance for downstream tasks?

rachelvov commented 7 years ago

I think for "summer school board council" V1 gold will be E1- "summer school", E2 - "board", E3 - council. you can't break summer school because it's not actually a school of/for summer, but "board council" is actually a council of boards. But this is indeed a hard case, most cases in the tweet are much easier. Can we maybe use the baseline I made for the OKR paper? It was very simple (taking all Spacy NER mentions, and separate nouns and adjectives for everything that is not part of a NER mention) and had 85% F1 score. @gabrielStanovsky not sure I understood what you meant in the last paragraph, let's talk tomorrow (in the lab or on phone).