Open GoogleCodeExporter opened 8 years ago
The second line of the first example should read:
from(cas, "CustomIndex").select(Sentence.class)
Original comment by richard.eckart
on 18 Mar 2011 at 1:15
I like:
from(cas, "CustomIndex").select(Sentence.class)
I'm not sold on:
from(cas).select(Token.class).where(and(feature("begin", GREATER_THAN, 5), feature("end", LESS_THAN, 10)));
If we're going to go the route of defining predicates over collections, I'd
prefer that we don't roll our own, and instead use an existing library, e.g.
Guava collections, where you'd write something like:
filter(from(cas).select(Token.class), and(feature("begin", GREATER_THAN, 5), feature("end", LESS_THAN, 10))));
So, in short, +1 for (J)CasUtil.from, -1 on our own implementation of filter
and predicates.
Original comment by steven.b...@gmail.com
on 23 Mar 2011 at 10:47
Guava's filter() method returns a Collection<E>. I would like to keep the
option open to have the "where" method return something more specialized, e.g.
something possibly implementing FSIndex and/or being able to produce a
FSIterator.
It would also introduce an additional dependency.
Original comment by richard.eckart
on 25 Mar 2011 at 5:21
I would also like to keep the option of the where() implementation being smart
about how it handles the predicates - that is depending on the actual instance
on which .where() is called, the where() implementation may exploit
instance-specific optimizations for particular predicates.
Original comment by richard.eckart
on 25 Mar 2011 at 5:23
I guess I'm okay with "select" returning something that has a "where" method
(or "filter" if we want to match Guava, Scala, etc.). But I don't think we
should create our own Predicate API.
Or maybe we should just give up on Java and switch over to Scala, where your
".where" clause could with no extra method definitions be written as:
.filter(t => t.getBegin > 5 && t.getEnd < 10)
;-)
Original comment by steven.b...@gmail.com
on 25 Mar 2011 at 6:04
Instead of using an external matching framework, the best option is probably to
use/extend the FSConstraint framework already present in UIMA. (cf.
CAS.createFilteredIterator()).
Original comment by richard.eckart
on 26 Mar 2011 at 12:25
Never noticed those before. So:
from(cas).select(Token.class).where(and(..., ...));
would be equivalent to:
cas.createFilteredIterator(
cas.getAnnotationIndex(JCasUtil.getType(jCas, Token.class)).iterator(),
ConstraintFactory.and(..., ...));
and we'd provide some utility functions for creating common FSMatchConstraints?
I'd be okay with that.
Original comment by steven.b...@gmail.com
on 26 Mar 2011 at 12:55
sounds really useful to me. It sounds a bit like another project I heard a
talk about which allowed you to do something similar.
http://uima.apache.org/downloads/sandbox/CFE_UG/CFE_UG.html
Don't feel like you need to give energy to looking into this other project (I
didn't!) I just thought I would mention it since it seems related.
Original comment by phi...@ogren.info
on 27 Mar 2011 at 2:18
@comment 7 by Steven
I experimented a bit with the filtered iterator. It works nicely, but it has a
horrible API. Originally I wanted to perform this (pseudo-SQL):
SELECT Token FROM cas WHERE coveredText = "a" and begin > 4 and being < 10
However, after debugging a bit, I noticed that there currently no way to access
the covered text in UIMA's filtered iterator framework. So I changed the
intended query to pseudo-SQL I would like to perform a
SELECT Annotation FROM cas WHERE type = Token and begin > 4 and being < 10
In Java, I would probably want to write something like this:
from(cas).select(Annotation.class).where(and(type(Token.class), gt("begin", 4), lt("begin", 10)))
but instead I have to write:
// Set up example
ConstraintFactory cf = ConstraintFactory.instance();
FSIterator<Annotation> iterator = jcas.getAnnotationIndex().iterator();
Type tokenType = jcas.getCasType(Token.type);
// Restrict to Tokens
FSTypeConstraint typeConstraint = cf.createTypeConstraint();
typeConstraint.add(tokenType);
// Restrict to begin > 4 && begin < 10
FeaturePath beginFeaturePath = cas.createFeaturePath();
beginFeaturePath.initialize("begin");
beginFeaturePath.typeInit(tokenType);
FSIntConstraint beginValueConstraint = cf.createIntConstraint();
beginValueConstraint.gt(4);
beginValueConstraint.lt(10);
FSMatchConstraint beginFeatureConstraint = cf.embedConstraint(beginFeaturePath, beginValueConstraint);
// Combine both constraints using "and"
FSMatchConstraint conjunction = cf.and(typeConstraint, beginFeatureConstraint);
FSIterator<Annotation> filteredIterator = cas.createFilteredIterator(iterator, conjunction);
@comment 8 by Philip
The FESL (Feature Extraction Specification Language) used by the CFE is XML
based. Seems to me to be nothing you would want to be writing in your Java
code. Looks like something that could be interesting with regards to ClearTK.
Briefly looking into the source, I find that most of the code is centered
around parsing FESL-XML and applying it to the CAS - I did not see anything
that would make life easier to the person coding in Java.
Original comment by whodance...@gmail.com
on 3 Apr 2011 at 10:32
I wasn't suggesting that we actually use the filtered iterator API directly,
but rather that your SQLCAS object (or whatever you call it) be implemented
internally using the filtered iterator API. Something like:
class SQLCAS {
public static SQLCAS from(CAS cas);
public SQLCAS select(Class<?> cls);
public SQLCAS where(FSMatchConstraint constraint);
public FSIterator<?> iterator();
}
And then each of those methods would be implemented using the code you wrote
above.
Original comment by steven.b...@gmail.com
on 3 Apr 2011 at 11:23
Of course. I just wanted to give a full example of the horrors of that API.
Original comment by whodance...@gmail.com
on 3 Apr 2011 at 11:28
These issues are candidates for version 1.3.0.
Original comment by richard.eckart
on 7 May 2011 at 5:31
Here's an implementation of something like this that will let you write things
like:
DocumentAnnotation document = CasQuery.from(this.jCas).select(DocumentAnnotation.class).single();
Iterator<Sentence> sentences = CasQuery.from(this.jCas).select(Sentence.class).iterator();
Collection<Token> tokens = CasQuery.from(this.jCas).select(Token.class).coveredBy(sentence);
Token token = CasQuery.from(this.jCas).select(Token.class).matching(annotation).single();
Chunk chunk = CasQuery.from(this.jCas).select(Chunk.class).zeroOrOne();
I was just playing around with this, so do with it as you will, but it might
make a useful starting point.
Original comment by steven.b...@gmail.com
on 10 May 2011 at 2:29
Attachments:
Nice. I'll have a look at it.
Original comment by richard.eckart
on 10 May 2011 at 2:45
Original comment by richard.eckart
on 4 Jan 2012 at 10:51
Original comment by richard.eckart
on 5 Jul 2012 at 4:02
Original comment by richard.eckart
on 7 Jan 2013 at 4:51
Original comment by richard.eckart
on 25 Aug 2013 at 8:17
Original issue reported on code.google.com by
richard.eckart
on 18 Mar 2011 at 3:29