Closed GoogleCodeExporter closed 8 years ago
The selectCovering() is a quite slow operation, that is why it is not
implemented in (J)CasUtil. If you need that operation, you should use the
ContainmentIndex, which builds a internal cache for which annotation contains
which other annotation once and a lookup then is fast.
I'll check what happens to our test-cases when I change the stuff to
FeatureStructure.
Original comment by richard.eckart
on 16 Mar 2011 at 1:30
Btw. for JCasUtil I think TOP is more appropriate since FeatureStructure, as
that is the root of the JCas class hierarchy. FeatureStructure is appropriate
for CasUtil.
Maybe "ContainmentIndex" should be renamed to "CoverageCache" or
"CoverageIndex" or something like that to be more obvious and might be useful
to have a factory method in (J)CasUtil as not to miss it.
Original comment by richard.eckart
on 16 Mar 2011 at 1:43
Original comment by richard.eckart
on 16 Mar 2011 at 1:46
Yeah, I wondered about the efficiency of selectCovering. And ContainmentIndex
looks like the solution I need - I never noticed it before. I think
ContainmentIndex as a name is fine - but a few links from (J)CasUtil would be
quite helpful. Perhaps the factory method ContainmentIndex.create should be
moved or copied into (J)CasUtil as createContainmentIndex?
I don't feel strongly about TOP vs. FeatureStructure. I just used
FeatureStructure because FSIterator<FeatureStructure> is the return type of
getAllIndexedFS().
Original comment by steven.b...@gmail.com
on 16 Mar 2011 at 1:59
I have made the necessary changes on my machine (AnnotationFS to
FeatureStructure) where possible. However, this is not completely backwards
compatible. With uimaFIT 1.1.0 you can write
for (AnnotationFS token : select(aCas, type)) {
with the modification you have to write
Collection<AnnotationFS> tokens = select(aCas, type);
for (AnnotationFS token : tokens) {
This is because Java is not smart enough to infer the return type of select()
in a enhanced for loop (e.g. to Iterable<AnnotationFS>) and simply assumes the
dumbest default, which is that select() returns a Iterable<FeatureStructure>,
which is not what you would want here.
I'm not sure yet how I like that. There are several options:
1) have selectFS() return FS and select() return AnnotationFS
2) add another parameter to select() where you can specify the return type in
cases where Java is not smart enough to infer the type
3) force users to write two lines instead of one (see above)
Briefly thinking about it, I tend towards 1).
I would also like to point out that getAllIndexedFS() is a bit magical on the
inside regarding where it picks up the FSes and in which order they are
returned when custom indexes are installed, while the getAnnotationIndex()
stuff reliably returns offset-sorted annotations. This also drives me towards
solution 1).
Original comment by richard.eckart
on 16 Mar 2011 at 4:51
Hmm... That's odd - the code I posted above for iterate works perfectly fine in
a for loop for me. What does your code look like?
That said, I'm also fine with selectFS() and select().
Original comment by steven.b...@gmail.com
on 16 Mar 2011 at 6:13
The problem occurs only in CasUtil, not in JCasUtil. In JCasUtil, Java can
infer the return type from the "type" parameter which is a Class. In CasUtil,
Java cannot do that because the type parameter is a UIMA Type. So I am adding
selectFS and select to CasUtil and in JCasUtil such a differentiation will not
be made.
However, this addresses only the loop thing. I am still a bit wary about the
voodoo that getAllIndexedFS() performs. At least our test cases at least are
all fine and so are those of DKPro. However, we have no test cases that include
any custom indexes.
I'll commit the changes nevertheless.
Original comment by richard.eckart
on 16 Mar 2011 at 6:51
JCasUtil.select() and iterate() methods now can return anything that inherits
from TOP.
CasUtil now contains an additional set of methods having "FS" in their name
(e.g. selectFS()) that can select any FeatureStructures.
Original comment by richard.eckart
on 16 Mar 2011 at 6:53
Cool. Thanks for the quick fixes! You addressed the selectCovering issue (and
it's being further addressed in Issue 60) and the FeatureStructure vs.
Annotation issues, but I don't think we can close this issue until we add the
other two items:
* Ability to specify container in terms of begin and end offset instead of just
as a covering Annotation
* Iteration over FSArrays
Original comment by steven.b...@gmail.com
on 16 Mar 2011 at 7:39
Whops, forgot those ;)
Original comment by richard.eckart
on 16 Mar 2011 at 8:17
Moved them into separate issues, which makes the changes more prominent in the
CHANGES file.
Original comment by richard.eckart
on 16 Mar 2011 at 8:24
Great, thanks!
Original comment by steven.b...@gmail.com
on 16 Mar 2011 at 8:32
Original comment by richard.eckart
on 7 Apr 2011 at 12:12
Original issue reported on code.google.com by
steven.b...@gmail.com
on 16 Mar 2011 at 9:43