stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.63k stars 2.7k forks source link

corenlp.war: XOMReader warnings break visualize output #60

Open kevinandrewjohnston opened 9 years ago

kevinandrewjohnston commented 9 years ago

java -version java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13)

Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

jetty-runner corenlp.war 2015-02-24 16:37:55.366:INFO::main: Logging initialized @108ms 2015-02-24 16:37:55.372:INFO:oejr.Runner:main: Runner 2015-02-24 16:37:55.457:INFO:oejs.Server:main: jetty-9.2.2.v20140723 2015-02-24 16:38:01.093:WARN:oeja.AnnotationConfiguration:main: ServletContainerInitializers: detected. Class hierarchy: empty 2015-02-24 16:38:01.331:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@606d8acf{/,file:/private/var/folders/qt/7v9m4kd572b0zw56pc3hxy5r0000gn/T/jetty-0.0.0.0-8080-corenlp.war-_-any-6993542127094007379.dir/webapp/,AVAILABLE}{file:/Users/spiliero/CoreNLP/corenlp.war} 2015-02-24 16:38:01.332:WARN:oejsh.RequestLogHandler:main: !RequestLog 2015-02-24 16:38:01.360:INFO:oejs.ServerConnector:main: Started ServerConnector@1d057a39{HTTP/1.1}{0.0.0.0:8080} 2015-02-24 16:38:01.361:INFO:oejs.Server:main: Started @6125ms Searching for resource: StanfordCoreNLP.properties Adding annotator tokenize TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer. Adding annotator ssplit Adding annotator pos Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.0 sec]. Adding annotator lemma Adding annotator ner annotators=tokenize, ssplit, pos, lemma, ner, parse, dcoref Unknown property: |annotators| Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [5.1 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.2 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [3.6 sec]. sutime.binder.1. Initializing JollyDayHoliday for sutime with classpath:edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt Feb 24, 2015 4:38:16 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules INFO: Ignoring inactive rule: null Feb 24, 2015 4:38:16 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules INFO: Ignoring inactive rule: temporal-composite-8:ranges Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt Adding annotator parse Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...done [0.5 sec]. Adding annotator dcoref Warning: nu.xom.xslt.XOMReader: XOMReader doesn't support http://javax.xml.XMLConstants/property/accessExternalDTD Warning: nu.xom.xslt.XOMReader: XOMReader doesn't support http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit

kevinandrewjohnston commented 9 years ago

screen shot 2015-02-24 at 4 56 58 pm

kevinandrewjohnston commented 9 years ago

All other output formats work correctly: screen shot 2015-02-24 at 5 00 29 pm

ghost commented 9 years ago

If there is an issue here it will probably be on me to fix the conversion scripts. I am unable to replicate the issue using the public CoreNLP demo, is it running a different version from the current CoreNLP release?

kevinandrewjohnston commented 9 years ago

I tested, unsuccessfully, using the following commits:

master current: 52dad92df2b4068c12bf67e1204547135e3fbc8 3.5.1 Release: 6a0dbd6ba34c165a20c0841ce92cbab2eeb78e

Yes the versions are different. The 3.5.1 release includes output format options that are not available in the public CoreNLP demo. I tried to search the code base for a version similar to the public demo but it does not appear to be in the code history.

screen shot 2015-02-24 at 6 08 40 pm

Is the code for the publicly available, corenlp demo (http://nlp.stanford.edu:8080/corenlp/) hosted somewhere else?