Closed GoogleCodeExporter closed 9 years ago
I like option 1. In fact, I am pretty certain I tried making this change once
upon a time, discovered that it broke some tests, changed it back, and then
left a little note for you above the configuration parameter "windowTypeName".
It should be easy enough to fix any unit tests that break by adding in the
necessary sentence annotations.
Original comment by pvogren@gmail.com
on 5 Nov 2010 at 2:43
btw - I am ambivalent about having getDescription methods at all, leaning
towards against them. Often when I use ClearTK components I am using a
completely different type system. Especially, in a case like this where it is
doing very little....
Original comment by pvogren@gmail.com
on 5 Nov 2010 at 2:46
It breaks one test,
org.cleartk.token.TokenizerAndTokenAnnotatorTest.ticket176(). And the problem
seems to be that UimaContext.getConfigParameterValue doesn't distinguish
between a null value that means it wasn't specified, and an explicit null value
passed in. So you can't just append:
TokenAnnotator.PARAM_WINDOW_TYPE_NAME, null
to the argument list of AnalysisEngineFactory.createPrimitive - because this is
treated the same as if PARAM_WINDOW_TYPE_NAME was not specified at all. If we
still want to stick with solution (1), we'd need to do something like allow ""
as the value that means no window types, and you'd have to specify this
explicitly if you wanted no window annotation.
I guess another possibility is to try to catch the mistake and issue a warning,
e.g. if PARAM_TOKENIZER_NAME=org.cleartk.token.util.PennTreebankTokenizer, then
PARAM_WINDOW_TYPE_NAME should never be null. We could just issue an warning
whenever we spot this kind of mistake.
Original comment by steven.b...@gmail.com
on 5 Nov 2010 at 4:07
So, the only way to have windowTypeName end up with a null value is to not give
it a default value and not specify a value in the uimaContext? Interesting!
...and frustrating. This sounds a bit like a possible bug in uimaFIT - but I'm
not sure how it would handle this either. I suppose I should at least file an
issue - because we should at least document this issue if we can't take care of
it.
One possibility is to pass in a different value in the test like
"org.apache.uima.jcas.tcas.DocumentAnnotation". Another possibility is to
revert to option (2) above. I think the warning above is kind of dorky - but
probably harmless.
Original comment by pvogren@gmail.com
on 5 Nov 2010 at 4:29
It's more of a bad design decision in UIMA than a bug in uimaFIT - the
documentation for UimaContext.getConfigParameterValue(String) says:
This method returns null if the parameter is optional and has not been assigned
a value.
http://uima.apache.org/downloads/releaseDocs/2.3.0-incubating/docs/api/org/apach
e/uima/UimaContext.html#getConfigParameterValue(java.lang.String)
UimaFIT loads the default value for a parameter if getConfigParameterValue
returns null. But I'm not really sure what else it could do given the available
UimaContext methods.
I like the DocumentAnnotation idea. We could then default to
PARAM_WINDOW_TYPE_NAME=org.cleartk.type.Sentence, which would get rid of these
really subtle extra-periods-at-the-ends-of-a-few-words bugs for people using
the ClearTK type system. People using their own type system and who
accidentally left PARAM_WINDOW_TYPE_NAME=org.cleartk.type.Sentence would get
much more obvious bugs (no tokens would be created). We could put some
documentation clearly at the top of TokenAnnotator explaining why you might bet
such bugs and recommending
PARAM_WINDOW_TYPE_NAME=org.apache.uima.jcas.tcas.DocumentAnnotation for these
situations.
If this sounds okay, let me know, and I'll go ahead and make this change, and
update the tests and documentation.
Original comment by steven.b...@gmail.com
on 7 Nov 2010 at 3:04
This sounds good to me. Thanks for your attention to this!
Original comment by pvogren@gmail.com
on 8 Nov 2010 at 3:48
This issue was closed by revision r2089.
Original comment by steven.b...@gmail.com
on 8 Nov 2010 at 1:53
Fixed in r2089.
Original comment by steven.b...@gmail.com
on 8 Nov 2010 at 1:54
So this is broken again. I don't know why - maybe either lost in the merge or
broken when cleartk-syntax-opennlp was updated? Could whoever broke this please
fix it again and add the test back in? See r2089 linked to in this issue for
the patch.
Original comment by steven.b...@gmail.com
on 26 Jan 2011 at 2:27
This must have slipped back in by accident during the project re-org. Sorry!
It is fixed in r2361.
Original comment by phi...@ogren.info
on 26 Jan 2011 at 6:25
Original issue reported on code.google.com by
steven.b...@gmail.com
on 5 Nov 2010 at 11:26