Closed GoogleCodeExporter closed 9 years ago
Yes - you beat me to creating this ticket. The license is also changed to ASL
which is much better too.
Long term, I still want to see work done on #40 and #41 to further reduce the
amount of OpenNLP related code that we support.
Also, I have a mind to submit a bug report about proper closing of files by
RealValueFileEventStream (or whatever is not doing its job) to the new apache
incarnation of the project.
Re: trove - that needs to go anyways since its LGPL. I believe cleartk-ml has
a dependency on it that is unrelated to the OpenNLP stuff. I will file a
separate issue.
Original comment by pvogren@gmail.com
on 5 Jan 2011 at 5:32
re: trove - nevermind - this was removed in issue #163. Thanks Philipp!
Original comment by pvogren@gmail.com
on 5 Jan 2011 at 6:21
I realize that I was slightly confused about this issue before. You may be
aware that opennnlp is now an incubator apache project and I was thinking these
version numbers are what they use for the new incarnation. Regardless,
upgrading to the latest version should make it easier to migrate to the apache
version when it comes out.
Original comment by pvogren@gmail.com
on 5 Jan 2011 at 10:58
Yeah, I believe they have the source up at the apache incubator, but I don't
think they've made a release there yet. They were only accepted on 24-Dec-2010,
so presumably it'll be a little while before they have a release. In the
meantime, I have to imagine that porting to the newest versions can only help
when they finally produce an incubator release.
Original comment by steven.b...@gmail.com
on 6 Jan 2011 at 12:14
The APIs for OpenNLP have changed considerably from the previous version we
were using. They are actually much simpler now and so it will simplify our
code considerably.
I would like to propose that we consolidate OpenNLPTreebankParser and
OpenNLPTaggerParser into a single class. They seem very similar and have a lot
of repeated code in them. The only difference that I can tell is that one uses
part-of-speech tags obtained from the CAS and the other lets the parser do the
tagging. I think it would be easy enough to provide a flag that allows for
either option. Does this make sense?
Original comment by pvogren@gmail.com
on 13 Jan 2011 at 10:39
Original comment by pvogren@gmail.com
on 13 Jan 2011 at 10:40
Yes, a flag that allows either CAS pos tags or parser POS tags sounds great to
me.
(And yeah, I also noticed that the new OpenNLP APIs make things *a lot*
cleaner.)
Original comment by steven.b...@gmail.com
on 13 Jan 2011 at 10:52
Sounds reasonable to me.
Original comment by phwetz...@google.com
on 14 Jan 2011 at 1:16
ok - while I was refactoring the opennlp wrappers to work with the latest
versions I took the liberty to clean up the code, rename every class, and
repackage some of the helper classes for the parser annotator. For example, I
renamed OpenNLPTreebankParser to ParserAnnotator.
I also fixed the broken "TokenRetriever" and "SentenceRetriever". The code for
both is now in InputTypesHelper which now allows you to actually use input
sentence and token types from a different type system as it was originally
intended to do. I got a little confused with a generics issue that came up and
so there's a hackish workaround which deserves its own issue.
I also built a new test model and updated the tests to work with that instead.
fixed in r2319
Original comment by pvogren@gmail.com
on 14 Jan 2011 at 7:07
Original issue reported on code.google.com by
steven.b...@gmail.com
on 5 Jan 2011 at 7:07