hltfbk / Excitement-Transduction-Layer

1 stars 1 forks source link

KeywordBasedFragmentAnnotator and unigram fragments #254

Closed salvadora closed 10 years ago

salvadora commented 10 years ago

Hi Vivi,

I tested the annotator with the file: ./src/test/resources/WP2_public_data_XML/keywordAnnotations.xml. In cases when the most of the keywords are only nouns, then I get many unigram fragments, because the nouns often do not govern other token.

Is it a required behaviour? I am asking, because there is a comment in your code about avoiding unigram fragments.

A very simple example: Sentence: DVD-Player spielt Film nicht ab. Keyword: DVD-Player, Film Fragments: DVD-Player, Film The parser recognizes correctly the dependency (spielt,DVD-Player) => SB and (spielt,Film) => OA.

Of course I am not referring to the cases, when the parser doesn't recognize the dependencies correctly and you can't do anything about it.

Best, Aleksandra

vnastase commented 10 years ago

Hi Aleksandra

I remember having two versions (one or two lines of code commented out or not) where the fragment is generated started from the noun going down, so to speak, or if it was anything other than a verb, I force a step up and then generate the fragment. I'll have a look to see if that's where this is coming from, or if it's a bug somewhere.

Vivi

On Tue, Jun 10, 2014 at 1:58 PM, AleksGabryszak notifications@github.com wrote:

Hi Vivi,

I tested the annotator with the file: ./src/test/resources/WP2_public_data_XML/keywordAnnotations.xml. In cases when the most of the keywords are only nouns, then I get many unigram fragments, because the nouns often do not govern other token.

Is it a required behaviour? I am asking, because there is a comment in your code about avoiding unigram fragments.

A very simple example: Sentence: DVD-Player spielt Film nicht ab. Keyword: DVD-Player, Film Fragments: DVD-Player, Film The parser recognizes correctly the dependency (spielt,DVD-Player) => SB and (spielt,Film) => OA.

Of course I am not referring to the cases, when the parser doesn't recognize the dependencies correctly and you can't do anything about it.

Best, Aleksandra

— Reply to this email directly or view it on GitHub https://github.com/hltfbk/Excitement-Transduction-Layer/issues/254.


Dr. Vivi Nastase

Human Language Technologies Research Unit Fondazione Bruno Kessler Via Sommarive 18, 38123 Povo - Trento (Italy) nastase@fbk.eu

vnastase commented 10 years ago

If #255 pull request is merged, the issue should be solved.

Vivi