Closed GoogleCodeExporter closed 9 years ago
Original comment by tristan.miller@nothingisreal.com
on 22 Jan 2014 at 1:32
ClearNLP Lemmatizer does not have this issue with underscores.
Original comment by pedrobss...@gmail.com
on 24 Jan 2014 at 10:48
This issue is not about underscores, it is about the lemmatizer returning
"null" in some cases because it may not know how to lemmatize a certain word.
Original comment by richard.eckart
on 25 Jan 2014 at 12:20
In the description: "When some lemmatizers encounter a token covering the text
"_" [...]"
Anyway, clearnlp does not assign null for underscores.
Original comment by pedrobss...@gmail.com
on 25 Jan 2014 at 12:32
The underscore problem is probably specific to the Stanford lemmatizer. For
most other words it can't lemmatize it just defaults to the covered text.
However, it and other lemmatizers may have a similar problem with other edge
cases. The fix I applied to StanfordLemmatizer will probably work with every
other lemmatizer; I just did something like
if (lemma.value == null)
lemma.value = token.getCoveredText();
(Don't have my development environment in front of me so this probably isn't
exactly what I wrote, but you get the picture.)
Original comment by tristan.miller@nothingisreal.com
on 25 Jan 2014 at 11:45
Need to check if all lemmatizers properly check for null values and if so set
the covered text as lemma.
Original comment by richard.eckart
on 6 Aug 2014 at 8:36
Original comment by eriklan.dodinh@gmail.com
on 15 Aug 2014 at 9:05
This issue was updated by revision r488
(https://code.google.com/p/dkpro-core-gpl/source/detail?r=488).
- Ensured non-null (more spec. Token.getCoveredText()) Lemma.value for GPL
components (GateLemmatizer, MateLemmatizer, SfstAnnotator)
Original comment by eriklan.dodinh@gmail.com
on 15 Aug 2014 at 9:34
This issue was updated by revision r2730.
- Ensured non-null (more spec. Token.getCoveredText()) Lemma.value for ASL
components (ClearNlpLemmatizer, CogrooLemmatizer, MeCabTagger, MorphaLemmatizer)
- Non-null already ensured for LanguageToolLemmatizer, TreeTagger (and
TokenMerger)
Original comment by eriklan.dodinh@gmail.com
on 15 Aug 2014 at 10:17
Original comment by eriklan.dodinh@gmail.com
on 15 Aug 2014 at 10:23
This issue was updated by revision r2747.
Merging into 1.6.x branch
Original comment by pedrobss...@gmail.com
on 20 Aug 2014 at 9:31
This issue was fixed by revision
https://code.google.com/p/dkpro-core-gpl/source/detail?r=497
Original comment by pedrobss...@gmail.com
on 20 Aug 2014 at 9:46
Original issue reported on code.google.com by
tristan.miller@nothingisreal.com
on 22 Jan 2014 at 1:32