sudeep87 / uimafit

Automatically exported from code.google.com/p/uimafit
0 stars 0 forks source link

Support for no-op steps in a pipeline #55

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
When building pipelines dynamically, from time to time a situation comes up 
where a particular analysis step should become a no-op (do nothing). It would 
be nice if uimaFIT would support such scenarios. One option would be to allow 
"null" as a valid parameter when building an aggregate or when running a 
pipeline:

tokenizer = createTokenizerAE(...);
sentence = createSentenceAE(...);
runPipeline(reader, tokenizer, sentence, writer);

Mind that depending on the context, the above "create" methods may return 
different implementations or possibly "null" if e.g. sentence analysis is not 
desired.

Another alternative would be for uimaFIT to provide a NoopAnalysisEngine which 
does nothing (process() is empty). While it is trivial to implement such an 
engine where appropriate, the concept of a NoopAnalysisEngine seems to be 
general enough to be provided as a commodity by uimaFIT.

Possibly even both, silently ignoring "null" arguments in pipelines and 
aggregates and a NoopAnalysisEngine, could be considered.

Original issue reported on code.google.com by richard.eckart on 3 Feb 2011 at 4:30

GoogleCodeExporter commented 8 years ago

Original comment by richard.eckart on 3 Feb 2011 at 4:30

GoogleCodeExporter commented 8 years ago
I'd be a little nervous about silently skipping nulls - seems like it might 
hide programming errors and make it hard to trace back to the source when a bug 
is finally provoked (e.g. a later annotator tries to use an annotation that 
should have been added by the annotator that was accidentally "null").

+1 on the NoOpAnalysisEngine though. I'm sure Philip will have an opinion on 
the name. ;-)

Original comment by steven.b...@gmail.com on 3 Feb 2011 at 4:57

GoogleCodeExporter commented 8 years ago
I think we already have the proposed analysis engine.  It's called 
org.uimafit.component.JCasAnnotatorAdapter.  

I share Steve's concern about skipping a "null" analysis engine.  

I think the scenario you've given above is handled quite nicely with 
AggregateBuilder.  I have used frequently for conditionally adding analysis 
engines to an aggregate.  Do you think this would work for you?

Original comment by phi...@ogren.info on 3 Feb 2011 at 5:07

GoogleCodeExporter commented 8 years ago
The JCasAnnotatorAdapter is quite what I was thinking. Although I would have 
suggested a using a CasAnnotator_ImplBase to avoid initializing the JCas types 
in pipelines where it is not necessary. I don't understand though, why it is 
called an "adapter" - I don't see it adapt anything to anything else. Could we 
rename the component?

I think there are better ways of implementing the methods where the component 
is currently used in JCasFactory.createJCas(), 
UimaContextFactory.createUimaContext() and JCasIterable that avoid using a dummy
AE.

Regarding the AggregateBuilder - the concrete code we have is something like 
this:

createComponent(type) {
  switch (type) {
  case A: return aeDesc1;
  case B: return aeDesc2;
  case NONE: return noopAeDesc;
  }
}

This is quite a nice/short way of writing things. Using the AggregateBuilder 
would require less "nice" if/then code.

Original comment by richard.eckart on 3 Feb 2011 at 5:32

GoogleCodeExporter commented 8 years ago
My reason for naming it *Adapter is not defensible - so yes we can rename it.  

Yeah - looks like CasCreationUtils should have something in there that we could 
use for JCasFactory.  

re: switch - if you are happy to return noopAeDesc I think that is preferred to 
SimplePipeline ignoring null.  

Original comment by phi...@ogren.info on 3 Feb 2011 at 5:44

GoogleCodeExporter commented 8 years ago
So I'll rename it NoOpAnalysisEngine (unless there are cries of outrage now ;) )

Would you also not mind inheriting from CasAnnotator_ImplBase?

I'm perfectly fine with not silently ignoring the "null".

Original comment by richard.eckart on 3 Feb 2011 at 5:47

GoogleCodeExporter commented 8 years ago
NoOpAnalysisEngine works for me.  Or NoOpAnnotator - a bit shorter.  

Do you mean org.apache.uima.analysis_component.CasAnnotator_ImplBase or 
org.uimafit.component.CasAnnotator_ImplBase?  I would think the former would be 
more appropriate.  But, yes, I am fine with being a CasAnnotator_ImplBase 
rather than a JCasAnnotator_ImplBase.    

Original comment by phi...@ogren.info on 3 Feb 2011 at 6:18

GoogleCodeExporter commented 8 years ago
The name is not NoOpAnnotator. I used 
org.apache.uima.analysis_component.CasAnnotator_ImplBase - as you suggested. I 
changed JCasFactory to work without the NoOpAnnotator, but left JCasIterable 
and UimaContextFactory as is for now.

Original comment by richard.eckart on 3 Feb 2011 at 9:03

GoogleCodeExporter commented 8 years ago
that is to say - is *now*.  

Thanks!

Original comment by phi...@ogren.info on 3 Feb 2011 at 9:32

GoogleCodeExporter commented 8 years ago

Original comment by richard.eckart on 7 Apr 2011 at 12:12

GoogleCodeExporter commented 8 years ago

Original comment by richard.eckart on 17 Apr 2011 at 1:33

GoogleCodeExporter commented 8 years ago

Original comment by richard.eckart on 8 May 2011 at 10:44