kulukimak / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Support for ClearNlp 3 #603

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This should probably go into a separate module so we can have the old ClearNLP 
around for another while.

Here the release annoucement for ClearNLP 3:

---

Folks,

After a long delay, the ClearNLP version 3 is finally here; in fact, the latest 
version is 3.0.2.  Here are the key changes in this version.

ClearNLP is now developed by the Center for Language and Information Research 
at Emory University.
Our maven group ID is changed from com.clearnlp to edu.emory.clir.
All our repositories are moved from github.com/clearnlp to github.com/clir.
The version 3.0.0 is written from the scratch. All components in this version 
show significant speed-up over the previous ones (2-3 times), and the 
statistical models consume less disk and memory space.
Staistical models for general, medical, and bioinformatics domains are provided 
(see here; the medical and bioinformatics models will be uploaded by March 
25th, 2015).
The tokenizer preserves non-UTF8 characters as they are; previously, they were 
converted to their UTF8 equivalent characters (e.g., smart double quotes to ").
The dependency parser is back to greedy parsing, which makes the model size 
much smaller (about 18 times less disk space) and much faster (about 10K tokens 
per second in Intel Xeon CPU) without sacrifying much accuracy (about .5% 
lower).
This version does not include the semantic role labeler. There have been many 
changes in PropBank and we decided to spend another month for developing a new 
semantic role labeler. The semantic role labeler will be ready in May, 2015.
We are preparing a named entity recognizer and a coreference resolution system. 
These systems will be ready in August, 2015.
Better documentation is provided at our guidelines project for more details 
about training, decoding, javadoc, etc.
ClearNLP is no longer a single person project; the whole NLP research team at 
Emory University is working on it and we're very excited about the potential of 
this project.  Please give us your feedback so we can make this better.  I'm 
finally back on track of developing/improving new/old components in ClearNLP so 
will be much more prompt.

Thanks and I hope everyone is enjoying the Spring.

best,

Jinho

---

Original issue reported on code.google.com by richard.eckart on 24 Mar 2015 at 10:53

GoogleCodeExporter commented 9 years ago
ClearNLP version 3.1.0 is released.

A new component for named entity recognition is added, which shows 
state-of-the-art accuracy on both CoNLL'03 and OntoNotes data (a paper 
describing our approach is under submission).
All statistical models are upgraded; the part-of-speech tagger and the 
dependency parser use features extracted from distributional semantics, which 
give more robust results on unseen data.
The dependency parser is trained on data from our new dependency conversion 
adapting many concenpts from the universal dependency structures and 
introducing some new useful labels such as "dative".
Components for semantic role labeling and coreference resolution will be added 
in June.  Please let me know if you have any question/suggestion on ClearNLP.  
Thank you!

best,

Jinho

Original comment by richard.eckart on 30 Apr 2015 at 8:36

GoogleCodeExporter commented 9 years ago
ClearNLP 3.1.1 is released.

Word embedding lexicons are removed from the global lexica, which didn't add 
much accuracy but took so much RAM space.  Furthermore, the gazetteers for 
named entity recognition are now separated from the global lexica for better 
modulation (see models for more details).
The core dictionary is updated; some past-tense verbs recognized as base verbs 
are now fixed.
The named entity recognition model is updated.
See pom.xml for all updated maven dependencies.
We'll be making many more good updates in the summer so please stay in tune.  
Thank you!

best,

Jinho

Original comment by richard.eckart on 8 May 2015 at 1:13