laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Implement viterbi for Classifier_ImplBase.classifySequence() #76

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
We need to provide a mechanism for returning K-best results from a
classifier for sequential tagging tasks.  There are two primary drivers for
this.  1) a viterbi-style search through the space of possible solutions
for a sequential tagging task such as part-of-speech tagging or chunking is
very common because it improves the accuracy of the top choice which is
returned.  2) It is often the case that you want to pass along the K-best
sequences to downstream components.  Right now we assume that a
non-sequential tagger performing a sequential tagging task can return the
best result by simply making the best decision it can at each decision
point and returning the resulting sequence.  

I was just reading the Ratnaparkhi parsing paper which describes the basic
algorithm that the OpenNLP parser is based on.  Here is an excerpt:

"It should be emphasize that if K > 1 (beam size), the parser does not
commit to a single POS or chunk assignment for the input sentence before
building constituent structure.  All three of the passes described in
section 2 are integrated in the search, i.e. when parsing a test sentence,
the input to the second pass consists of K of the best distinct POS tag
assignments for the input sentence."

Reimplementing this parser in ClearTK would be really difficult right now.
 We should think about how to provide search capability for sequential
tagging tasks and a mechanism to "return" the K-best sequences.  The former
may not make sense for all non-sequential classifiers (i.e. LIBSVM doesn't
provide probabilities with a classification) but should be quite
straightforward for e.g. Maxent.  

This issue was briefly raised in #57 but was not addressed there because we
used that issue to address a related problem of providing
OutcomeFeatureExtractor.  

Original issue reported on code.google.com by pvogren@gmail.com on 20 Mar 2009 at 5:22

GoogleCodeExporter commented 9 years ago
resolving this will make it much easier to answer one of Wayne's favorite 
questions
that he always asks "why do you make hard decisions at the beginning of a 
pipeline? 
why not postpone those decisions in e.g. semantic analysis?"  

Original comment by pvogren@gmail.com on 20 Mar 2009 at 5:24

GoogleCodeExporter commented 9 years ago
Isn't this just Classify.score(List<Feature>, int)?

Or are you asking what the type system should look like?

Original comment by steven.b...@gmail.com on 20 Mar 2009 at 5:37

GoogleCodeExporter commented 9 years ago
This is unrelated to any type system.  Point 2 above could be addressed by 
another
score method that returned K-best sequences - but you would certainly want to 
do a
viterbi-style beam search to get those K-best sequences.  

Original comment by pvogren@gmail.com on 20 Mar 2009 at 6:03

GoogleCodeExporter commented 9 years ago
I'm still not clear on why Classify.score(List<Feature>, int) isn't sufficient 
for
your purposes. Could try to explain it in a little more detail?

Original comment by steven.b...@gmail.com on 20 Mar 2009 at 8:40

GoogleCodeExporter commented 9 years ago
[Apologies for the typos above. s/Classify/Classifier/g]

Maybe you're suggesting that there should be a
Classifier.scoreSequence(List<List<Feature>>, int) analagous to
Classifier.classifySequence(List<List<Feature>>)?

Original comment by steven.b...@gmail.com on 20 Mar 2009 at 8:48

GoogleCodeExporter commented 9 years ago
Currently, none of the non-sequential classifiers have a classifySequence method
implemented - they simply inherit from Classifier_ImplBase which simply iterates
through the instances classifying them one at a time after adding features from 
the
OutcomeFeatureExtractor.  There is no Viterbi whatsoever.  Certainly, you can 
see why
this would be useful and that it relates to which K-best sequences would be 
returned?

Original comment by pvogren@gmail.com on 20 Mar 2009 at 9:46

GoogleCodeExporter commented 9 years ago
I think I'm just unclear what it is you're asking for. Your original post said 
you
wanted "a mechanism for returning K-best results from a classifier". We have 
such a
mechanism, and it's called Classifier.score(List<Feature>, int), though I 
certainly
have no problem with you adding Classifier.scoreSequence(List<List<Feature>>, 
int) as
well.

But your last post suggests that just providing "a mechanism for returning 
K-best
results from a classifier" isn't the point of this ticket, and instead you're 
really
requesting that we modify Classifier_ImplBase.classifySequence() to implement
Viterbi. Is that what you're asking for?

Original comment by steven.b...@gmail.com on 20 Mar 2009 at 9:58

GoogleCodeExporter commented 9 years ago
right.  we will need to update the api (i.e. add a scoreSequence method) and the
underlying implementation (i.e. add viterbi.)  I think it would rather silly to 
do
the former without the latter except where it *may* not make sense (e.g. libsvm 
-can
you just sum the scores here?)  

Original comment by pvogren@gmail.com on 20 Mar 2009 at 10:34

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 20 Mar 2009 at 10:45

GoogleCodeExporter commented 9 years ago
I propose to make the following changes to Classifier.java:

 * remove method score(List<Feature>).  It is not used anywhere in ClearTK and can be
equivalent to score(List<Feature>, 1).
 * add method scoreSequence(List<List<Feature>>, int) - where the second parameter
determines the maximum number of results to return.  

and the following to Classifier_ImplBase.java:

 * override method scoreSequence and have it throw an UnsupportedOperationException -
i.e. there will be no default implementation for now.

and the following to MaxentClassifier.java:
 * override method scoreSequence and have it implement viterbi search.  This is the
only classifier I will add this to for now.

The above changes seem pretty straightforward to me given the discussion on this
ticket thus far.  The piece I am missing now is how expose my annotation 
handler to
score an scoreSequence.  One possibility is to add the following methods (see 
#78 for
change to ScoredOutcome):

 * public List<ScoredOutcome> consume(Instance<OUTCOME_TYPE> instance, int maxReturns) 
* public List<List<ScoredOutcome>> consumeAll(List<Instance<OUTCOME_TYPE>> 
instances,
int maxReturns)

One detail I have glossed over is that maxReturns and the beam size of Viterbi 
will
generally be the same.  However, the annotation handler is not going to 
necessarily
expect/understand this.  For example, the annotation handler might specify that 
it
wants only one scored outcome back without any loss of performance of the 
returned
outcome.  One option is to is to add a beam size to the above methods (both 
consume
and score methods).  But this doesn't feel right to me.  I would rather have 
e.g.
MaxentClassifier worry about what the right beam size should by setting a config
param value or a reasonable default.  

Original comment by pvogren@gmail.com on 23 Mar 2009 at 5:12

GoogleCodeExporter commented 9 years ago
 * remove method score(List<Feature>)
 * add method scoreSequence(List<List<Feature>>, int)
 * override method scoreSequence and have it throw an UnsupportedOperationException

These all sound fine.

* override method MaxentClassifier.scoreSequence and have it implement viterbi 
search

This sounds basically fine, except that I wonder why we can't have one viterbi
implementation that both maxent and the svm models can share (from some shared 
base
class). Is there something that makes it particularly hard to write code that 
works
for both?

 * InstanceConsumer.consume(Instance<OUTCOME_TYPE>, int) 
 * InstanceConsumer.consumeAll(List<Instance<OUTCOME_TYPE>>, int)

I worry a little that these don't make much sense for a DataWriter, and 
therefore
don't belong in InstanceConsumer. I see a couple other possibilities:

* Add them to ClassifierAnnotator, and require AnnotationHandlers that want to 
get
scored results to cast their InstanceConsumer to a ClassifierAnnotator.

* Require people who want to add scored results to a JCas to subclass
ClassifierAnnotator and do their work there.

I'm not sure these are really better, but I definitely feel uncomfortable with
putting something into InstanceConsumer that's only relevant to 
ClassifierAnnotators.

I don't totally understand the maxReturns/beam size issue. Can you try 
explaining it
again?

Original comment by steven.b...@gmail.com on 24 Mar 2009 at 3:21

GoogleCodeExporter commented 9 years ago
[This sounds basically fine, except that I wonder why we can't have one viterbi
implementation that both maxent and the svm models can share (from some shared 
base
class). Is there something that makes it particularly hard to write code that 
works
for both?]

Not really.  At the time I was thinking that one might want to calculate the 
current
best score by either multiplying the values or adding them.  This could be
parameterized - but it is likely that you always want to add the values.  In 
the case
of classifiers that return probabilities it could add the logprops instead of
multiplying the probabilities together.  

[I don't totally understand the maxReturns/beam size issue. Can you try 
explaining it
again?]

Well, if you call scoreSequence with max returns set to 1 you would likely get a
different result than the top result returned by setting it to 40 if the maximum
number of returns is the same as the beam size (i.e. 1 and 40).  A user might 
think,
I only want the best result - so I'll just set it to 1 but this may or may not 
be
right.  

Original comment by pvogren@gmail.com on 30 Mar 2009 at 6:45

GoogleCodeExporter commented 9 years ago
Ahh, I got it now. Yes, there's a difference between asking for the top result 
and
asking for the top result after a viterbi search through the top results.

I also agree that the beam size belongs in whatever class implements viterbi - 
we
definitely shouldn't assume that the only way to implement
consume(Instance<OUTCOME_TYPE> instance, int maxReturns) is with a viterbi 
search.

Original comment by steven.b...@gmail.com on 31 Mar 2009 at 5:16

GoogleCodeExporter commented 9 years ago
[I'm not sure these are really better, but I definitely feel uncomfortable with
putting something into InstanceConsumer that's only relevant to 
ClassifierAnnotators.]

I hear what you are saying.  I'm not really happy with my proposal or your
alternatives.  I feel that your concern about adding something to the consumer 
that
doesn't apply to data writers is outweighed by two other factors: 1) the current
consume and consumeAll methods are already partially ill-suited for data writers
because of the return value which data writers never return.  We did this 
anyways so
that we can call the consumer methods in exactly the same way regardless of the
context.  So, there is already a precedence that consumers must understand the
intention of the method signature.  2) It would be really nice to be able to 
continue
to call consumer methods in exactly the same way in both contexts (data writing 
and
classificatin).  Adding an int parameter that data writers will (typically?) 
ignore
is a small price to pay compared to casting an instance consumer (how do you 
know
when to cast it?) or making developers extend ClassifierAnnotator.  For these
reasons, I would much rather add the two methods to the consumer interface.  

Original comment by pvogren@gmail.com on 3 Apr 2009 at 4:46

GoogleCodeExporter commented 9 years ago
Steve and I decided against adding the two new methods for the following 
reasons:  

1) my immediate use-case is to get back the best sequence based on a viterbi 
search.
 I do not need to a maxReturns parameter to do this - nor do I need a list of scored
outcomes either.  This can be accomplished via setting UIMA parameters.  

2) In the kind of use case where you really care about having a viterbi lattice 
to
pass to a downstream component - it isn't clear that a List<ScoredOutcome> is 
the
appropriate representation.  When one of us decides to implement something like 
e.g.
the Ratnaparkhi parser quoted above, then we will have a clearer idea of what is
needed here.

Original comment by pvogren@gmail.com on 3 Apr 2009 at 5:32

GoogleCodeExporter commented 9 years ago
I have added the following comment to the javadoc of my Viterbi implementation. 
 The
most important thing to me is that the best score comes out on top.  If this is
embarrassingly wrong, then please let me know:

This implementation of Viterbi does not strictly return the N-best possible
sequences. It actually returns the best possible sequence for the N-best
classification values of the last element in the sequence. For example, this 
method
might return the following values:
 * sequence classifications: NN JJ score=.95</li>
 * sequence classifications: JJ NN score=.65</li>

It may be that the sequence "JJ JJ" would score at .75 given a model.  However, 
only
the sequence that ends with JJ with the highest score (i.e. "NN JJ") will be 
included
in the returned list.

Original comment by pvogren@gmail.com on 3 Apr 2009 at 6:26

GoogleCodeExporter commented 9 years ago
I think that with my two most recent posts that I have really argued that I 
should
actually not add a scoreSequence method until we need it.  I only want the best 
score
and that is what my current viterbi implementation gives.  It doesn't do a very 
good
job of getting the n-best sequences and I don't want them anyways for now.  I am
goint to remove scoreSequence from local files and use viterbi in 
MaxentClassifier's
classifySequence method.  

Original comment by pvogren@gmail.com on 3 Apr 2009 at 7:12

GoogleCodeExporter commented 9 years ago
It turns out that there is no way for a classifier to get information from a 
uima
context.  Does anyone have a problem with adding initialize(UimaContext) to the
Classifier interface?  It would get called by ClassifierAnnotator at the bottom 
of
its initialize method.

Original comment by pvogren@gmail.com on 3 Apr 2009 at 10:18

GoogleCodeExporter commented 9 years ago
I've wanted this before, when I needed to tell a topic model how many 
iterations of
Gibbs sampling to run when adding topics to new documents.

That said, the one downside is that Classifier now needs to know about UIMA, 
while
before it didn't. Not sure how big of a deal that is, given that our code 
relies so
heavily on UIMA anyway. The only way I can imagine around this is to have
Classifier.initialize take a Map object, and then we wrap UIMAContext to look 
like a
Map. Don't know if it's worth the effort.

Original comment by steven.b...@gmail.com on 3 Apr 2009 at 10:27

GoogleCodeExporter commented 9 years ago
Not true.  Our classifiers only superficially don't know about UIMA.  But in 
fact, it
would be very difficult to outside the context of UIMA and the other 
infrastructure
we've put in place to use them.  Feature extraction would be difficult and 
things
like feature encoders which are generally created by factories which have an
initialize(UimaContext) method.  

Original comment by pvogren@gmail.com on 3 Apr 2009 at 11:08

GoogleCodeExporter commented 9 years ago
I only brought it up because I vaguely remembered discussing trying to keep
Classifier UIMA-clean before. But I doubt anyone is ever going to try to use our
Classifier object outside of UIMA, so I'm fine with adding 
initialize(UimaContext).
You may want to check with 3P though.

Original comment by steven.b...@gmail.com on 3 Apr 2009 at 11:40

GoogleCodeExporter commented 9 years ago
And once again I'm the dissenting voice :)   First, I disagree with Philip on 
one thing: Classifiers now are very 
easily usable outside of UIMA. Feature extraction is a completely separate 
layer that can easily be replaced 
with something else. Data writers are, right now, still dependent on UIMA, but 
that may change, given some 
discussions we've had. Feature encoding is not at all dependent on UIMA. Yes, 
the factories are, but they're 
trivial to write without it. Would anyone actually use Classifiers outside of 
the established UIMA based 
process? I don't know; I do know that I'd consider it and like having the 
option.

More importantly, though, it compromises the idea of classifier independence. I 
assume the point of passing 
the context is to allow classifier-specific parameters to be passed at 
classification time; consequently the 
classification pipeline must now be aware of the type of classifier, where 
before it only needed to know the 
path to the classifier file (which could contain any compatible classifier). Is 
there a reason why the parameters 
can't be set at model-creation time?

Original comment by phwetz...@gmail.com on 4 Apr 2009 at 12:26

GoogleCodeExporter commented 9 years ago
Tt's possible, but inconvenient. Imagine you want to run some experiments with
different beam sizes. To do this with our current Train/BuildJar infrastructure,
you'd have to retrain the entire model each time you wanted to use a different 
beam
size. But this is silly, since the beam size plays no role in the training, 
only in
classification. So either we need to make it classify-time configurable, or we 
need
to improve the Train/BuildJar infrastructure so that you can easily create 
duplicates
of models, but with different classify-time parameters set.

Original comment by steven.b...@gmail.com on 4 Apr 2009 at 12:46

GoogleCodeExporter commented 9 years ago
Whatever we choose to do with the classifier interface, we definitely need to 
break up Train/BuildJar as you 
say. For example it is now more complicated than it should be to train a model 
manually on the command 
line. A specific case where that's a problem that I've come across is using 
SVMperf to train an SVMlight model 
(the model produced by SVMperf follows the SVMlight format and can be used in 
place of an SVMlight model 
in ClearTK). As things are now, if I do this I then have to manually assemble 
the model.jar file (and make sure 
I use the correct filenames everywhere). This is too complicated for what I'm 
sure will be a pretty common 
case.

One thing we could do is storing classifier parameters in the manifest file; 
with a decent BuildJar utility in 
place that would give a pretty easy way to create a set of models with 
different parameter values. I'm open for 
other ideas, only I would strongly prefer if we could a) preserve classifier 
independence and b) keep UIMA out 
of the classifiers.

Original comment by phwetz...@gmail.com on 4 Apr 2009 at 6:02

GoogleCodeExporter commented 9 years ago
[First, I disagree with Philip on one thing: Classifiers now are very 
easily usable outside of UIMA. Feature extraction is a completely separate 
layer that
can easily be replaced 
with something else.]

So, this is a little frustrating to me.  I remember talking with you (3p) about 
this
exact topic last fall and we decided that it would be very difficult and highly
unlikely to use out classifiers outside of the UIMA environment.  In fact, the 
way I
remember it is that I said something about how classifiers were independent of 
UIMA
and you reminded me why it would be difficult to use them outside of the UIMA
context.  I may have a bad memory and you have the right to change your mind.  
But
let's be honest - ClearTK as a whole is deeply committed UIMA.  I disagree, for
example, that replacing the feature extraction layer would be easy.  That would 
be a
giant pain in the butt!  

The particular parameter in question, beam-size, is not specific to a particular
classifier and is not (for the reasons Steve described above) set at training 
time. 
Also, I am not entirely sure what you mean by classifier independence.  One of 
the
reasons I want to be able to set a parameter in a descriptor file that is 
handled by
a classifier initialize method - is so that my annotation handler doesn't have 
to do
this kind of thing (e.g. cast the consumer to a particular subclass of
ClassifierAnnotator so that it can set a parameter.)  I have no problem with a
descriptor file that is aware of what is in the pipeline and what parameters 
need to
be set to make things work nicely together (that's the whole point of the 
descriptors.)  

Also, (and this is a fairly weak argument) adding an initialize method that 
takes a
UimaContext isn't nearly a strong of a coupling to UIMA as giving it a method 
that
took e.g. a JCas.  We have made it pretty easy to create a UimaContext 
consisting of
name/value pairs and this could be used outside of the "normal" UIMA context.  
We
could, as Steve suggested, have an initialize(Map) method instead - but I think 
this
just makes our code more confusing.  

Finally, I am not sure how to go forward.  I have code ready to commit that is 
unit
tested and I have experiments ready to go that I want to run on a different 
server. 
It would be really nice if I could use subversion to update the code on the 
server. 
So, Steve and I either need to invoke the 2/3 veto - or I need to move my local
ClearTK to a different repository.  I can see that this is where it would be 
nice to
have a mercurial repository.  Although, I don't see how this really makes things
easier in the long run if we are going to share a common code base.  

Original comment by pvogren@gmail.com on 6 Apr 2009 at 4:00

GoogleCodeExporter commented 9 years ago
I don't remember the exact discussion; but at some point our classifier 
interface _was_ much more closely 
tied to UIMA. With the refactorings we've done since then, that's not the case 
anymore. As for feature 
extraction: The only thing it does is produce a list of Feature objects, which 
can be produced in lots of other 
ways. I've said before that our feature extraction layer is only of fairly 
limited use to my experiments, and if I 
wanted to I could probably write all the necessary functionality from scratch 
within a day or two.

How is beam-size not specific to a particular classifier? For most classifiers 
it doesn't make any sense 
whatsoever.

If you both feel strongly that giving the classifier a UIMA context is the way 
to proceed, I won't resist that. I 
guess just I don't like the idea of passing around an opaque object to all 
sorts of different components, and 
they all pick and choose which of the contained information they want, with no 
systematic checking of any of 
that information. I might have less of a problem with that if we were working 
in Python, but in Java this just 
feels fundamentally wrong.

Or maybe I just don't really trust UIMA and would prefer to have our code 
independent of it as far as possible.

Well, that's what I have to say. If you're sure you want to do it your way, go 
ahead.

Original comment by phwetz...@gmail.com on 6 Apr 2009 at 5:01

GoogleCodeExporter commented 9 years ago
[How is beam-size not specific to a particular classifier? For most  
classifiers it doesn't make any sense
whatsoever.]

Let me describe what I have done for the Viterbi implementation and hopefully 
it will
make more sense.  I have introduced a class called Viterbi which has a single 
static
method that looks like this:

public static <OUTCOME_TYPE> List<OUTCOME_TYPE> 
classifySequence(List<List<Feature>>
features, int beamSize, Class<OUTCOME_TYPE> cls, Classifier<OUTCOME_TYPE> 
classifier,
OutcomeFeatureExtractor[] outcomeFeatureExtractors, boolean addScores)

MaxentClassifier's classifySequence now looks like this:

public List<String> classifySequence(List<List<Feature>> features){
    if(sequenceBeamSize == 1)
        return super.classifySequence(features);
    else {
        return Viterbi.classifySequence(features, sequenceBeamSize, String.class,
this, outcomeFeatureExtractors, false); 
    }
}

The method parameter sequenceBeamSize is initialized in MaxentClassifier's 
initialize
method after reading in the param defined as Viterbi.PARAM_BEAM_SIZE.  

There is no reason why the other classifiers couldn't do exactly the same thing 
for
their classifySequence methods.   

Original comment by pvogren@gmail.com on 6 Apr 2009 at 5:14

GoogleCodeExporter commented 9 years ago
[I guess just I don't like the idea of passing around an opaque object to all 
sorts
of different components, and they all pick and choose which of the contained
information they want, with no systematic checking of any of that information.]

Well, I don't think it is nearly as opaque as it could be - we could, for 
example,
just pass a Map around.  As it is now, we do a pretty good job of documenting 
our
descriptor parameters and I think we are getting better at this (see #44).  I 
like
the idea of having a consistent way of initializing our various components at 
runtime
(we are using initialize(UimaContext) in a number of places and having a 
consistent
way documenting them.  

Here's a compromise.  We can add the initialize method to Classifier_ImplBase 
and
then have ClassifierAnnotator call the initialize method only if it is an 
instance of
this class.  Would that be more palatable for you?

Original comment by pvogren@gmail.com on 6 Apr 2009 at 5:24

GoogleCodeExporter commented 9 years ago
Another possibility is to have an interface named something like Initializable 
with a
single initialize method defined.  Components that want to be initialized with a
UimaContext could just implement this interface.  Then ClassifierAnnotator 
could just
check if the interface is implemented before calling the initialize method.  
This
seems like a better compromise because it doesn't pollute Classifier_ImplBase.

Original comment by pvogren@gmail.com on 6 Apr 2009 at 5:32

GoogleCodeExporter commented 9 years ago
I like the Initializable idea - it has the advantage that it's applicable to 
other
parts of our code as well, e.g. AnnotationHandler.

Original comment by steven.b...@gmail.com on 6 Apr 2009 at 6:14

GoogleCodeExporter commented 9 years ago
Yeah- Philipp is ok with it too.  We just spent the last hour or so talking 
about
various issues.  One thing that has come up (again) is that our classifier 
interface
should be split into two.  I will file a separate issue and we can discuss it 
there.

Original comment by pvogren@gmail.com on 6 Apr 2009 at 6:49

GoogleCodeExporter commented 9 years ago
Ok - I made the following changes to support viterbi search:

 * added Viterbi class
 * added Initializable interface
 * changed MaxentClassifier to implement Initialiable.
 * MaxentClassifier.classifySequence calls Viterbi.classifySequence method

We can change the other non-sequential classifiers to do something similar - 
but I
think this should come quite naturally after #82 is addressed.

Original comment by pvogren@gmail.com on 6 Apr 2009 at 10:03