stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.64k stars 2.7k forks source link

RegexNER overwrites CoreNLP NER tags #983

Closed victoriastuart closed 4 years ago

victoriastuart commented 4 years ago

Update / Solution: it was a $CLASSPATH issue. Users reading this can skip most of the content below; here are the key comments.

https://github.com/stanfordnlp/CoreNLP/issues/983#issuecomment-573462561 [@J38 comment] https://github.com/stanfordnlp/CoreNLP/issues/983#issuecomment-573536392 [@victoriastuart actual issue | solution] https://github.com/stanfordnlp/CoreNLP/issues/983#issuecomment-573537834 [@victoriastuart comment re: appending $CLASSPATH to ~/.bashrc or ~/.profile]


I'm going to echo @DaveQuinn29 's concern [#910] that there is a cache and/or some other issue in CoreNLP. Issues involved include:


I had been trying to add custom NER tagging via a custom RegexNER file, like I used with the JAVA version a couple of years ago (more recently in Python via stanfordnlp). I can't get RegexNER to work in Python, so I returned to the JAVA implementation of CoreNLP -- from the command line -- to troubleshoot )and work from there, if needed).

java -Xmx16g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators 'tokenize,ssplit,pos,lemma,regexner' \
-regexner.mapping custom_entities.tsv \
-file input_sentence.txt \
-outputFormat text

However, once again I have not had much success in NER tagging with CoreNLP's trained models plus my custom RegexNER TSV file, formatted as described at https://nlp.stanford.edu/software/regexner.html

Victoria    PERSON  ORGANIZATION,CITY   2
Stuart  PERSON  ORGANIZATION,CITY   2
Victoria Stuart PERSON      2
Yap PRGE    PERSON  2
p53|p53-mediated    PRGE        2
...

I've tried various permutations of 'tokenize, ssplit, pos, lemma, ner, regexner (always in that relative order) with -regexner.mapping | -ner.additional.regexner.mapping ... and I cannot simultaneously NER tag text with the default CoreNLP libraries plus my own custom NER tags.

While I can get either NER (the default statistical model + ... fine NER rules added via the regexner annotator) or -regexner.mapping (with my custom tokens file) to work, it's always either one or the other. And before I'm directed there (cough: @J38), I've certainly looked at the https://stanfordnlp.github.io/CoreNLP/ner.html page to which we are so often referred.

Furthermore, in evaluating various permutations of annotators, I've found upon stepwise additions adding the lemma annotator is particularly troublesome, immediately breaking RegexNER. And, when I try to step back, old annotator settings are retained (cached?). For example, I get lemmatization, a dependency parse, etc. in the output even if those annotators are not included in the annotators argument list.

It appears that whenever CoreNLP encounters an error, it silently loads the defaults, so that user-defined settings are ignored.


Similar issues / concerns have been raised elsewhere.


If I slowly add annotators one at a time, I can (sort of: not consistently) "reset" CoreNLP:

java -Xmx16g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators 'tokenize,ssplit,pos,lemma,regexner' \
  -regexner.mapping custom_entities.tsv -file input_sentence.txt -outputFormat text; echo; cat input_sentence.txt.out; echo

  Adding annotator tokenize
  No tokenizer type provided. Defaulting to PTBTokenizer.
  Adding annotator ssplit
  Adding annotator pos
  Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
  Adding annotator lemma
  Adding annotator regexner
  TokensRegexNERAnnotator regexner: Read 13 unique entries out of 13 from custom_entities.tsv, 0 TokensRegex patterns.

  Processing file /mnt/Vancouver/apps/CoreNLP/target/input_sentence.txt ... writing to /mnt/Vancouver/apps/CoreNLP/target/input_sentence.txt.out
  Annotating file /mnt/Vancouver/apps/CoreNLP/target/input_sentence.txt ... done [0.2 sec].

  Annotation pipeline timing information:
  TokenizerAnnotator: 0.1 sec.
  WordsToSentencesAnnotator: 0.0 sec.
  POSTaggerAnnotator: 0.0 sec.
  MorphaAnnotator: 0.0 sec.
  TokensRegexNERAnnotator: 0.0 sec.
  TOTAL: 0.2 sec. for 9 tokens at 56.3 tokens/sec.
  Pipeline setup: 0.6 sec.
  Total time for StanfordCoreNLP pipeline: 0.8 sec.

  Document: ID=input_sentence.txt (1 sentences, 9 tokens)
  Sentence #1 (9 tokens):
  Victoria Stuart lives in Vancouver, British Columbia.
  [Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=PERSON]
  [Text=Stuart CharacterOffsetBegin=9 CharacterOffsetEnd=15 PartOfSpeech=NNP Lemma=Stuart NamedEntityTag=PERSON]
  [Text=lives CharacterOffsetBegin=16 CharacterOffsetEnd=21 PartOfSpeech=VBZ Lemma=live]
  [Text=in CharacterOffsetBegin=22 CharacterOffsetEnd=24 PartOfSpeech=IN Lemma=in]
  [Text=Vancouver CharacterOffsetBegin=25 CharacterOffsetEnd=34 PartOfSpeech=NNP Lemma=Vancouver]
  [Text=, CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=, Lemma=,]
  [Text=British CharacterOffsetBegin=36 CharacterOffsetEnd=43 PartOfSpeech=NNP Lemma=British]
  [Text=Columbia CharacterOffsetBegin=44 CharacterOffsetEnd=52 PartOfSpeech=NNP Lemma=Columbia]
  [Text=. CharacterOffsetBegin=52 CharacterOffsetEnd=53 PartOfSpeech=. Lemma=.]

Adding ner annotator breaks RegexNER tagging (above):

java -Xmx16g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators 'tokenize,ssplit,pos,lemma,ner,regexner' \
-regexner.mapping custom_entities.tsv -file input_sentence.txt -outputFormat text; echo; cat input_sentence.txt.out; echo

  Adding annotator tokenize
  No tokenizer type provided. Defaulting to PTBTokenizer.
  Adding annotator ssplit
  Adding annotator pos
  Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
  Adding annotator lemma
  Adding annotator ner
  Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.9 sec].
  Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.5 sec].
  Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.3 sec].
  Adding annotator regexner
  TokensRegexNERAnnotator regexner: Read 13 unique entries out of 13 from custom_entities.tsv, 0 TokensRegex patterns.

  Processing file /mnt/Vancouver/apps/CoreNLP/target/input_sentence.txt ... writing to /mnt/Vancouver/apps/CoreNLP/target/input_sentence.txt.out
  Annotating file /mnt/Vancouver/apps/CoreNLP/target/input_sentence.txt ... done [0.1 sec].

  Annotation pipeline timing information:
  TokenizerAnnotator: 0.0 sec.
  WordsToSentencesAnnotator: 0.0 sec.
  POSTaggerAnnotator: 0.0 sec.
  MorphaAnnotator: 0.0 sec.
  NERCombinerAnnotator: 0.0 sec.
  TokensRegexNERAnnotator: 0.0 sec.
  TOTAL: 0.1 sec. for 9 tokens at 80.4 tokens/sec.
  Pipeline setup: 4.3 sec.
  Total time for StanfordCoreNLP pipeline: 4.5 sec.

  Document: ID=input_sentence.txt (1 sentences, 9 tokens)
  Sentence #1 (9 tokens):
  Victoria Stuart lives in Vancouver, British Columbia.
  [Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=ORGANIZATION]
  [Text=Stuart CharacterOffsetBegin=9 CharacterOffsetEnd=15 PartOfSpeech=NNP Lemma=Stuart NamedEntityTag=ORGANIZATION]
  [Text=lives CharacterOffsetBegin=16 CharacterOffsetEnd=21 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
  [Text=in CharacterOffsetBegin=22 CharacterOffsetEnd=24 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
  [Text=Vancouver CharacterOffsetBegin=25 CharacterOffsetEnd=34 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
  [Text=, CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=, Lemma=, NamedEntityTag=O]
  [Text=British CharacterOffsetBegin=36 CharacterOffsetEnd=43 PartOfSpeech=NNP Lemma=British NamedEntityTag=LOCATION]
  [Text=Columbia CharacterOffsetBegin=44 CharacterOffsetEnd=52 PartOfSpeech=NNP Lemma=Columbia NamedEntityTag=LOCATION]
  [Text=. CharacterOffsetBegin=52 CharacterOffsetEnd=53 PartOfSpeech=. Lemma=. NamedEntityTag=O]

"Cache" (?) issue -- no classpath, etc. given yet outputs previous result (obfuscating debugging attempts, by the way):

java -Xmx16g -annotators 'tokenize,ssplit,pos,lemma,ner,regexner' -regexner.mapping custom_entities.tsv \
-file input_sentence.txt -outputFormat text; echo; cat input_sentence.txt.out; echo

Unrecognized option: -annotators
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Document: ID=input_sentence.txt (1 sentences, 9 tokens)
Sentence #1 (9 tokens):
Victoria Stuart lives in Vancouver, British Columbia.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=ORGANIZATION]
[Text=Stuart CharacterOffsetBegin=9 CharacterOffsetEnd=15 PartOfSpeech=NNP Lemma=Stuart NamedEntityTag=ORGANIZATION]
[Text=lives CharacterOffsetBegin=16 CharacterOffsetEnd=21 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=22 CharacterOffsetEnd=24 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=25 CharacterOffsetEnd=34 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
[Text=, CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=British CharacterOffsetBegin=36 CharacterOffsetEnd=43 PartOfSpeech=NNP Lemma=British NamedEntityTag=LOCATION]
[Text=Columbia CharacterOffsetBegin=44 CharacterOffsetEnd=52 PartOfSpeech=NNP Lemma=Columbia NamedEntityTag=LOCATION]
[Text=. CharacterOffsetBegin=52 CharacterOffsetEnd=53 PartOfSpeech=. Lemma=. NamedEntityTag=O]

java -annotators 'tokenize,ssplit,pos,lemma,ner,regexner' -regexner.mapping custom_entities.tsv \
-file input_sentence.txt -outputFormat text; echo; cat input_sentence.txt.out; echo
Unrecognized option: -annotators
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Document: ID=input_sentence.txt (1 sentences, 9 tokens)
Sentence #1 (9 tokens):
Victoria Stuart lives in Vancouver, British Columbia.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=ORGANIZATION]
[Text=Stuart CharacterOffsetBegin=9 CharacterOffsetEnd=15 PartOfSpeech=NNP Lemma=Stuart NamedEntityTag=ORGANIZATION]
[Text=lives CharacterOffsetBegin=16 CharacterOffsetEnd=21 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=22 CharacterOffsetEnd=24 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=25 CharacterOffsetEnd=34 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
[Text=, CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=British CharacterOffsetBegin=36 CharacterOffsetEnd=43 PartOfSpeech=NNP Lemma=British NamedEntityTag=LOCATION]
[Text=Columbia CharacterOffsetBegin=44 CharacterOffsetEnd=52 PartOfSpeech=NNP Lemma=Columbia NamedEntityTag=LOCATION]
[Text=. CharacterOffsetBegin=52 CharacterOffsetEnd=53 PartOfSpeech=. Lemma=. NamedEntityTag=O]

java 'tokenize,ssplit,pos,lemma,ner,regexner' -regexner.mapping custom_entities.tsv \
-file input_sentence.txt -outputFormat text; echo; cat input_sentence.txt.out; echo
Error: Could not find or load main class tokenize,ssplit,pos,lemma,ner,regexner

Document: ID=input_sentence.txt (1 sentences, 9 tokens)
Sentence #1 (9 tokens):
Victoria Stuart lives in Vancouver, British Columbia.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=ORGANIZATION]
[Text=Stuart CharacterOffsetBegin=9 CharacterOffsetEnd=15 PartOfSpeech=NNP Lemma=Stuart NamedEntityTag=ORGANIZATION]
[Text=lives CharacterOffsetBegin=16 CharacterOffsetEnd=21 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=22 CharacterOffsetEnd=24 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=25 CharacterOffsetEnd=34 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
[Text=, CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=British CharacterOffsetBegin=36 CharacterOffsetEnd=43 PartOfSpeech=NNP Lemma=British NamedEntityTag=LOCATION]
[Text=Columbia CharacterOffsetBegin=44 CharacterOffsetEnd=52 PartOfSpeech=NNP Lemma=Columbia NamedEntityTag=LOCATION]
[Text=. CharacterOffsetBegin=52 CharacterOffsetEnd=53 PartOfSpeech=. Lemma=. NamedEntityTag=O]
J38 commented 4 years ago

Just to clarify, does this example not work for you? It's key to not include regexner in any way. The ner annotator should be running the entire named entity recognition process, and having the extra regexner could definitely interfere.

java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.regexner.mapping example-rule.txt -file rule-sentences.txt -outputFormat text

When I run that example I see my rules and statistical model blended together.

victoriastuart commented 4 years ago

@J38 : hello; thank you for your reply. Yes, that is correct: when I run that exact pipeline (your suggestion, above and Example 2, below),

`-annotators tokenize, ssplit, pos, lemma, ner  -ner.additional.regexner.mapping`

I do not get the blended output.

The default CoreNLP tagging -- which tags Victoria (me), Vancouver (city) and Canada (country) as LOCATION, and tags apples bananas as O (OTHER) -- is shown for reference in Example 1.

I only get RegexNER tagging when I include regexner as an annotator (see. e.g., Example 3),

`-annotators tokenize, ssplit, pos, lemma, ner, regexner  -regexner.mapping`

or

`-annotators tokenize, ssplit, pos, lemma, regexner  -regexner.mapping`

and in those instances there is no blended tagging.

[Suggestion: if you are working from your own machine, where you develop / code CoreNLP packages, please go to a fresh machine and git clone the repo, to make sure that you are running the same code as those of us who get the code that way.]

Environment:

$ uname -a
Linux victoria 5.4.10-arch1-1 #1 SMP PREEMPT Thu, 09 Jan 2020 10:14:29 +0000 x86_64 GNU/Linux

$ which java
/usr/bin/java

$ java -version
openjdk version "13.0.1" 2019-10-15
OpenJDK Runtime Environment (build 13.0.1+9)
OpenJDK 64-Bit Server VM (build 13.0.1+9, mixed mode)

$ echo $JAVA_HOME
/usr/lib/jvm/java-13-openjdk/bin/java

$ echo $CORENLP_HOME/
/mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/

$ cat input_sentences.txt 
Victoria lives in Vancouver, Canada. She likes apples and bananas.

$ cat custom_entities.tsv
Victoria    PERSON  LOCATION,ORGANIZATION,CITY  2
Vancouver   CITY    LOCATION,ORGANIZATION   2
Canada  COUNTRY LOCATION,ORGANIZATION,CITY  2
apple   FRUIT       2
banana  FRUIT       2

Example 1:

$ java -cp "*" -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators tokenize,ssplit,pos,lemma,ner \
-file input_sentences.txt \
-outputFormat text; \
cat input_sentences.txt.out

Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.5 sec].

Processing file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... writing to /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt.out
Annotating file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... done [0.1 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.0 sec.
MorphaAnnotator: 0.0 sec.
NERCombinerAnnotator: 0.0 sec.
TOTAL: 0.1 sec. for 13 tokens at 91.5 tokens/sec.
Pipeline setup: 2.7 sec.
Total time for StanfordCoreNLP pipeline: 2.9 sec.

Document: ID=input_sentences.txt (2 sentences, 13 tokens)
Sentence #1 (7 tokens):
Victoria lives in Vancouver, Canada.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=LOCATION]
[Text=lives CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=18 CharacterOffsetEnd=27 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
[Text=, CharacterOffsetBegin=27 CharacterOffsetEnd=28 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=Canada CharacterOffsetBegin=29 CharacterOffsetEnd=35 PartOfSpeech=NNP Lemma=Canada NamedEntityTag=LOCATION]
[Text=. CharacterOffsetBegin=35 CharacterOffsetEnd=36 PartOfSpeech=. Lemma=. NamedEntityTag=O]
Sentence #2 (6 tokens):
She likes apples and bananas.
[Text=She CharacterOffsetBegin=37 CharacterOffsetEnd=40 PartOfSpeech=PRP Lemma=she NamedEntityTag=O]
[Text=likes CharacterOffsetBegin=41 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=like NamedEntityTag=O]
[Text=apples CharacterOffsetBegin=47 CharacterOffsetEnd=53 PartOfSpeech=NNS Lemma=apple NamedEntityTag=O]
[Text=and CharacterOffsetBegin=54 CharacterOffsetEnd=57 PartOfSpeech=CC Lemma=and NamedEntityTag=O]
[Text=bananas CharacterOffsetBegin=58 CharacterOffsetEnd=65 PartOfSpeech=NNS Lemma=banana NamedEntityTag=O]
[Text=. CharacterOffsetBegin=65 CharacterOffsetEnd=66 PartOfSpeech=. Lemma=. NamedEntityTag=O]

Example 2:

$ java -cp "*" -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators tokenize,ssplit,pos,lemma,ner \
-ner.additional.regexner.mapping custom_entities.tsv \
-file input_sentences.txt \
-outputFormat text; \
cat input_sentences.txt.out

Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.5 sec].

Processing file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... writing to /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt.out
Annotating file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... done [0.1 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.0 sec.
MorphaAnnotator: 0.0 sec.
NERCombinerAnnotator: 0.0 sec.
TOTAL: 0.1 sec. for 13 tokens at 100.0 tokens/sec.
Pipeline setup: 2.7 sec.
Total time for StanfordCoreNLP pipeline: 2.8 sec.

Document: ID=input_sentences.txt (2 sentences, 13 tokens)
Sentence #1 (7 tokens):
Victoria lives in Vancouver, Canada.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=LOCATION]
[Text=lives CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=18 CharacterOffsetEnd=27 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
[Text=, CharacterOffsetBegin=27 CharacterOffsetEnd=28 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=Canada CharacterOffsetBegin=29 CharacterOffsetEnd=35 PartOfSpeech=NNP Lemma=Canada NamedEntityTag=LOCATION]
[Text=. CharacterOffsetBegin=35 CharacterOffsetEnd=36 PartOfSpeech=. Lemma=. NamedEntityTag=O]
Sentence #2 (6 tokens):
She likes apples and bananas.
[Text=She CharacterOffsetBegin=37 CharacterOffsetEnd=40 PartOfSpeech=PRP Lemma=she NamedEntityTag=O]
[Text=likes CharacterOffsetBegin=41 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=like NamedEntityTag=O]
[Text=apples CharacterOffsetBegin=47 CharacterOffsetEnd=53 PartOfSpeech=NNS Lemma=apple NamedEntityTag=O]
[Text=and CharacterOffsetBegin=54 CharacterOffsetEnd=57 PartOfSpeech=CC Lemma=and NamedEntityTag=O]
[Text=bananas CharacterOffsetBegin=58 CharacterOffsetEnd=65 PartOfSpeech=NNS Lemma=banana NamedEntityTag=O]
[Text=. CharacterOffsetBegin=65 CharacterOffsetEnd=66 PartOfSpeech=. Lemma=. NamedEntityTag=O]

Example 3:

$ java -cp "*" -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators tokenize,ssplit,pos,lemma,regexner \
-regexner.mapping custom_entities.tsv \
-file input_sentences.txt \
-outputFormat text; \
cat input_sentences.txt.out

Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
Adding annotator lemma
Adding annotator regexner
TokensRegexNERAnnotator regexner: Read 5 unique entries out of 5 from custom_entities.tsv, 0 TokensRegex patterns.

Processing file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... writing to /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt.out
Annotating file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... done [0.2 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.0 sec.
MorphaAnnotator: 0.1 sec.
TokensRegexNERAnnotator: 0.0 sec.
TOTAL: 0.2 sec. for 13 tokens at 83.3 tokens/sec.
Pipeline setup: 0.8 sec.
Total time for StanfordCoreNLP pipeline: 1.0 sec.

Document: ID=input_sentences.txt (2 sentences, 13 tokens)
Sentence #1 (7 tokens):
Victoria lives in Vancouver, Canada.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=PERSON]
[Text=lives CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=VBZ Lemma=live]
[Text=in CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=IN Lemma=in]
[Text=Vancouver CharacterOffsetBegin=18 CharacterOffsetEnd=27 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=CITY]
[Text=, CharacterOffsetBegin=27 CharacterOffsetEnd=28 PartOfSpeech=, Lemma=,]
[Text=Canada CharacterOffsetBegin=29 CharacterOffsetEnd=35 PartOfSpeech=NNP Lemma=Canada NamedEntityTag=COUNTRY]
[Text=. CharacterOffsetBegin=35 CharacterOffsetEnd=36 PartOfSpeech=. Lemma=.]
Sentence #2 (6 tokens):
She likes apples and bananas.
[Text=She CharacterOffsetBegin=37 CharacterOffsetEnd=40 PartOfSpeech=PRP Lemma=she]
[Text=likes CharacterOffsetBegin=41 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=like]
[Text=apples CharacterOffsetBegin=47 CharacterOffsetEnd=53 PartOfSpeech=NNS Lemma=apple]
[Text=and CharacterOffsetBegin=54 CharacterOffsetEnd=57 PartOfSpeech=CC Lemma=and]
[Text=bananas CharacterOffsetBegin=58 CharacterOffsetEnd=65 PartOfSpeech=NNS Lemma=banana]
[Text=. CharacterOffsetBegin=65 CharacterOffsetEnd=66 PartOfSpeech=. Lemma=.]
J38 commented 4 years ago

What are the contents of the directory where you are running this command?

J38 commented 4 years ago

Looking over your output, it seems like you're running an older Stanford CoreNLP, because it doesn't appear to be running the fine-grained stuff by default when the ner annotator is specified.

For instance when I run using Stanford CoreNLP 3.9.2 I see this output

$ ~/stanford-corenlp/working_dirs/ner$ echo $CLASSPATH
~/stanford-corenlp/3.9.2/*:
$ ~/stanford-corenlp/working_dirs/ner$ java -Xmx10g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.regexner.mapping victoria-rules.txt -file victoria-example.txt -outputFormat text
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.8 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.8 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580704 unique entries out of 581863 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585573 unique entries from 2 files
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.additional.regexner: Read 1 unique entries out of 1 from victoria-rules.txt, 0 TokensRegex patterns.

Processing file ~/stanford-corenlp/working_dirs/ner/victoria-example.txt ... writing to ~/stanford-corenlp/working_dirs/ner/victoria-example.txt.out
Annotating file ~/stanford-corenlp/working_dirs/ner/victoria-example.txt ... done [4.7 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 4.5 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.0 sec.
MorphaAnnotator: 0.1 sec.
NERCombinerAnnotator: 0.1 sec.
TOTAL: 4.7 sec. for 7 tokens at 1.5 tokens/sec.
Pipeline setup: 17.3 sec.
Total time for StanfordCoreNLP pipeline: 22.2 sec.

Also in your Example 2 everything is tagged LOCATION, which indicates the fine-grained NER did not run at all.

J38 commented 4 years ago

But when I look at your output I'm not seeing ner.fine.regexner nor ner.additional.regexner running

J38 commented 4 years ago

So if you're running this in a directory with older Stanford CoreNLP code, the -cp "*" will cause it to use whatever code is in the directory you're running the command...the CORENLP_HOME variable is used by the Python code, the Java code would ignore that...

victoriastuart commented 4 years ago

Hi: sorry: I should have mentioned my classpath. I have two CoreNLP installations:

Here are the details.

[victoria@victoria stanford-corenlp-full-2018-10-05]$ pwd; ls -l
/mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05
total 386064
-rw-rw-r-- 1 victoria victoria      6103 Oct  8  2018 build.xml
-rwxrwxr-x 1 victoria victoria       871 Oct  8  2018 corenlp.sh
-rwxrwxr-x 1 victoria victoria      5477 Oct  8  2018 CoreNLP-to-HTML.xsl
-rw-r--r-- 1 victoria victoria       101 Jan  9 16:39 custom_entities.tsv
-rw-rw-r-- 1 victoria victoria    211938 Oct  8  2018 ejml-0.23.jar
-rw-rw-r-- 1 victoria victoria   1227451 Oct  8  2018 ejml-0.23-src.zip
-rw-r--r-- 1 victoria victoria        36 Jan  9 16:33 input_sentence.txt
-rw-rw-r-- 1 victoria victoria        89 Oct  8  2018 input.txt
-rw-rw-r-- 1 victoria victoria     19868 Oct  8  2018 input.txt.xml
-rw-rw-r-- 1 victoria victoria     56674 Oct  8  2018 javax.activation-api-1.2.0.jar
-rw-rw-r-- 1 victoria victoria     78896 Oct  8  2018 javax.activation-api-1.2.0-sources.jar
-rw-rw-r-- 1 victoria victoria     54860 Oct  8  2018 javax.json-api-1.0-sources.jar
-rw-rw-r-- 1 victoria victoria     85147 Oct  8  2018 javax.json.jar
-rw-rw-r-- 1 victoria victoria    128032 Oct  8  2018 jaxb-api-2.4.0-b180830.0359.jar
-rw-rw-r-- 1 victoria victoria    270926 Oct  8  2018 jaxb-api-2.4.0-b180830.0359-sources.jar
-rw-rw-r-- 1 victoria victoria    254858 Oct  8  2018 jaxb-core-2.3.0.1.jar
-rw-rw-r-- 1 victoria victoria    345974 Oct  8  2018 jaxb-core-2.3.0.1-sources.jar
-rw-rw-r-- 1 victoria victoria   1099271 Oct  8  2018 jaxb-impl-2.4.0-b180830.0438.jar
-rw-rw-r-- 1 victoria victoria   1132702 Oct  8  2018 jaxb-impl-2.4.0-b180830.0438-sources.jar
-rw-rw-r-- 1 victoria victoria    774317 Oct  8  2018 joda-time-2.9-sources.jar
-rw-rw-r-- 1 victoria victoria    629506 Oct  8  2018 joda-time.jar
-rw-rw-r-- 1 victoria victoria    196945 Oct  8  2018 jollyday-0.4.9-sources.jar
-rw-rw-r-- 1 victoria victoria    213591 Oct  8  2018 jollyday.jar
-rw-rw-r-- 1 victoria victoria      1667 Oct  8  2018 LIBRARY-LICENSES
-rw-rw-r-- 1 victoria victoria     35147 Oct  8  2018 LICENSE.txt
-rw-rw-r-- 1 victoria victoria       769 Oct  8  2018 Makefile
drwxrwxr-x 2 victoria victoria      4096 Oct  8  2018 patterns
-rw-rw-r-- 1 victoria victoria      6279 Oct  8  2018 pom-java-11.xml
-rw-rw-r-- 1 victoria victoria      6135 Oct  8  2018 pom.xml
-rw-rw-r-- 1 victoria victoria   1347123 Oct  8  2018 protobuf.jar
-rw-rw-r-- 1 victoria victoria      4262 Oct  8  2018 README.txt
-rw-r--r-- 1 victoria victoria      2698 Jan  8 20:35 regexner.props
-rw-rw-r-- 1 victoria victoria       367 Oct  8  2018 RESOURCE-LICENSES
-rw-rw-r-- 1 victoria victoria      2445 Oct  8  2018 SemgrexDemo.java
-rw-r--r-- 1 victoria victoria      1720 Jan  7 18:06 serialized.props
-rw-rw-r-- 1 victoria victoria      1828 Oct  8  2018 ShiftReduceDemo.java
-rw-rw-r-- 1 victoria victoria     32127 Oct  8  2018 slf4j-api.jar
-rw-rw-r-- 1 victoria victoria     10712 Oct  8  2018 slf4j-simple.jar
-rw-rw-r-- 1 victoria victoria   8146873 Oct  8  2018 stanford-corenlp-3.9.2.jar
-rw-rw-r-- 1 victoria victoria   9687426 Oct  8  2018 stanford-corenlp-3.9.2-javadoc.jar
-rw-rw-r-- 1 victoria victoria 362565193 Oct  8  2018 stanford-corenlp-3.9.2-models.jar
-rw-rw-r-- 1 victoria victoria   5370905 Oct  8  2018 stanford-corenlp-3.9.2-sources.jar
-rw-rw-r-- 1 victoria victoria      7240 Oct  8  2018 StanfordCoreNlpDemo.java
-rw-rw-r-- 1 victoria victoria    199885 Oct  8  2018 StanfordDependenciesManual.pdf
drwxrwxr-x 2 victoria victoria      4096 Oct  8  2018 sutime
-rw-r--r-- 1 victoria victoria       702 Jan  8 14:51 text.props
drwxrwxr-x 2 victoria victoria      4096 Oct  8  2018 tokensregex
-rw-rw-r-- 1 victoria victoria    672122 Oct  8  2018 xom-1.2.10-src.jar
-rw-rw-r-- 1 victoria victoria    313253 Oct  8  2018 xom.jar
[victoria@victoria stanford-corenlp-full-2018-10-05]$ 

[victoria@victoria CoreNLP]$ date; pwd
  Tue 10 Dec 2019 11:46:09 AM PST
  /mnt/Vancouver/apps/CoreNLP

[victoria@victoria CoreNLP]$ git pull
  Already up to date.

[victoria@victoria CoreNLP]$ mvn package
  ...
  [ ... SNIP! ... ]
  Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec

  Results :
  Tests run: 1241, Failures: 0, Errors: 0, Skipped: 1

  [INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ stanford-corenlp ---
  [INFO] Building jar: /mnt/Vancouver/apps/CoreNLP/target/stanford-corenlp-3.9.2.jar
  [INFO] --- build-helper-maven-plugin:1.7:attach-artifact (attach-models) @ stanford-corenlp ---
  Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.pom
  Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.pom (601 B at 22 kB/s)
  Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven/2.0/maven-2.0.pom
  Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven/2.0/maven-2.0.pom (8.8 kB at 283 kB/s)
  [INFO] ------------------------------------------------------------------------
  [INFO] BUILD SUCCESS
  [INFO] ------------------------------------------------------------------------
  [INFO] Total time:  43.980 s
  [INFO] Finished at: 2019-12-10T11:54:57-08:00
  [INFO] ------------------------------------------------------------------------

I have been running CoreNLP here, where I have CoreNLP git cloned:

[victoria@victoria ~]$ pwd

[victoria@victoria target]$ pwd
  /mnt/Vancouver/apps/CoreNLP/target

[victoria@victoria target]$ java -cp "*" -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,regexner -file input_sentences.txt -outputFormat text; echo; cat input_sentences.txt.out

# ----------------------------------------------------------------------------

~/.bashrc:

# export CORENLP_HOME=/mnt/Vancouver/apps/CoreNLP/target
export CORENLP_HOME=/mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05

<<COMMENT
  2020-01-10:

  [victoria@victoria RegexNER]$ java -Xmx16g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP
    Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLP

  cd /mnt/Vancouver/apps/CoreNLP/target

  [victoria@victoria target]$ java -Xmx16g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP
    Searching for resource: StanfordCoreNLP.properties ... not found.
    Searching for resource: edu/stanford/nlp/pipeline/StanfordCoreNLP.properties ... found.
    Adding annotator tokenize
    No tokenizer type provided. Defaulting to PTBTokenizer.
    Adding annotator ssplit
    Adding annotator pos
    Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
    Adding annotator lemma
    Adding annotator ner
    ...

  THIS WORKS FROM ANY DIR:

      java -Xmx16g -cp '/mnt/Vancouver/apps/CoreNLP/target/*' edu.stanford.nlp.pipeline.StanfordCoreNLP

  FOR THIS, MUST cd TO /mnt/Vancouver/apps/CoreNLP/target/

      cd /mnt/Vancouver/apps/CoreNLP/target/
      java -Xmx16g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP
COMMENT

# ----------------------------------------------------------------------------

[victoria@victoria target]$ echo $CORENLP_HOME/
  /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/

[victoria@victoria target]$ export CORENLP_HOME=/mnt/Vancouver/apps/CoreNLP/target

[victoria@victoria target]$ exec bash

[victoria@victoria target]$ echo $CORENLP_HOME/
  /mnt/Vancouver/apps/CoreNLP/target/

# ----------------------------------------------------------------------------

[victoria@victoria apps]$ cd CoreNLP
[victoria@victoria CoreNLP]$ pwd; ls -l
/mnt/Vancouver/apps/CoreNLP
total 17944
-rw-r--r--  1 victoria victoria     1311 Dec  6 13:55 build.gradle
-rw-r--r--  1 victoria victoria    27113 Dec  6 13:55 build.xml
drwxr-xr-x  2 victoria victoria     4096 Jul  7  2017 classes
-rw-r--r--  1 victoria victoria     4901 Jul  7  2017 commonbuildjsp.xml
-rw-r--r--  1 victoria victoria     1824 Jul  7  2017 CONTRIBUTING.md
-rw-r--r--  1 victoria victoria     3197 Dec 10 14:29 corenlp_test.py
drwxr-xr-x  4 victoria victoria     4096 Jul  7  2017 data
drwxr-xr-x 14 victoria victoria     4096 Dec  6 13:55 doc
drwxr-xr-x  3 victoria victoria     4096 Dec  6 13:55 examples
drwxr-xr-x  3 victoria victoria     4096 Jul  7  2017 gradle
-rwxr-xr-x  1 victoria victoria     5241 Jul  7  2017 gradlew
-rw-r--r--  1 victoria victoria     2260 Jul  7  2017 gradlew.bat
drwxr-xr-x  2 victoria victoria     4096 Dec 12 14:45 input
drwxr-xr-x  3 victoria victoria     4096 Jul  7  2017 itest
-rw-r--r--  1 victoria victoria     8166 Jul  7  2017 JavaNLP-core.eml
-rw-r--r--  1 victoria victoria      129 Jul  7  2017 JavaNLP-core.iml
drwxr-xr-x  3 victoria victoria     4096 Jan  9 16:19 lib
drwxr-xr-x  2 victoria victoria     4096 Jul  7  2017 liblocal
drwxr-xr-x  3 victoria victoria     4096 Jan  9 16:19 libsrc
drwxr-xr-x  4 victoria victoria     4096 Jul  7  2017 licenses
-rw-r--r--  1 victoria victoria    35147 Jul  7  2017 LICENSE.txt
-rw-r--r--  1 victoria victoria     3391 Jul  7  2017 module_core.xml
drwxr-xr-x  2 victoria victoria     4096 Dec 10 20:53 output
-rw-r--r--  1 victoria victoria     6374 Jan  9 16:19 pom-java-11.xml
-rw-r--r--  1 victoria victoria     6221 Jan  9 16:19 pom.xml
-rw-r--r--  1 victoria victoria     7935 Dec  6 13:55 README.md
-rw-r--r--  1 victoria victoria    74539 Jan 10 12:19 _readme-victoria-CoreNLP-StanfordNLP-notes.txt
-rw-r--r--  1 victoria victoria   196676 Dec 30 20:05 _readme-victoria-corenlp.txt
-rw-r--r--  1 victoria victoria    10638 Dec 24 17:21 _readme-victoria-stanford_openie.txt
-rw-r--r--  1 victoria victoria      367 Dec  6 13:55 RESOURCE-LICENSES
drwxr-xr-x 11 victoria victoria     4096 Dec  6 13:55 scripts
-rw-r--r--  1 victoria victoria 12326806 Dec 17 19:55 spacy
drwxr-xr-x  3 victoria victoria     4096 Aug 18  2017 src
drwxr-xr-x  6 victoria victoria     4096 Dec 31 15:32 src-local
drwxr-xr-x  3 victoria victoria     4096 Dec 12 16:50 stanford-corenlp-full
-rw-r--r--  1 victoria victoria  5528127 Dec 10 14:37 stanfordnlp
drwxr-xr-x 11 victoria victoria     4096 Jan 12 18:46 target
drwxr-xr-x  4 victoria victoria     4096 Jul  7  2017 test
drwxr-xr-x  3 victoria victoria     4096 Jan  3 15:16 _victoria
drwxr-xr-x  7 victoria victoria     4096 Nov  7  2017 web

[victoria@victoria CoreNLP]$ cd target

[victoria@victoria target]$ ls -l
total 1849924
drwxr-xr-x 3 victoria victoria       4096 Jul  7  2017 classes
-rw-r--r-- 1 victoria victoria        159 Jan 12 19:24 custom_entities2.tsv
-rw-r--r-- 1 victoria victoria        159 Jan 12 19:25 custom_entities.tsv
-rw-r--r-- 1 victoria victoria        419 Jan  9 17:23 custom_entities.tsv.bak
-rw-r--r-- 1 victoria victoria       2541 Aug  9  2017 DependencyTreeExample.class
-rw-r--r-- 1 victoria victoria       1430 Aug  9  2017 DependencyTreeExample.java
-rw-r--r-- 1 victoria victoria         85 Jan 12 18:36 fruit.rules
drwxr-xr-x 3 victoria victoria       4096 Jul  7  2017 generated-sources
drwxr-xr-x 3 victoria victoria       4096 Jul  7  2017 generated-test-sources
drwxr-xr-x 3 victoria victoria      12288 Aug 24  2017 icons
-rw-r--r-- 1 victoria victoria         78 Jan 10 13:31 input_sentence_2.txt
-rw-r--r-- 1 victoria victoria         67 Jan 12 19:17 input_sentences.txt
-rw-r--r-- 1 victoria victoria       1528 Jan 12 21:54 input_sentences.txt.out
-rw-r--r-- 1 victoria victoria         54 Jan 12 18:42 input_sentence.txt
-rw-r--r-- 1 victoria victoria       2857 Jan 10 17:04 input_sentence.txt.json
-rw-r--r-- 1 victoria victoria       1117 Jan 12 16:24 input_sentence.txt.out
-rw-r--r-- 1 victoria victoria          0 Jan 10 16:59 input_sentence.txt.xml
drwxr-xr-x 2 victoria victoria       4096 Jul  7  2017 maven-archiver
drwxr-xr-x 3 victoria victoria       4096 Jul  7  2017 maven-status
-rw-r--r-- 1 victoria victoria       3650 Jan 12 18:33 rule-sentences.txt.out
-rw-r--r-- 1 victoria victoria         26 Jan 12 18:37 sentence.txt
-rw-r--r-- 1 victoria victoria        702 Jan 12 18:44 sentence.txt.out
-rw-r--r-- 1 victoria victoria         85 Jan 12 18:36 sports_teams.rules
-rw-r--r-- 1 victoria victoria    9106502 Aug 18  2017 stanford-corenlp-3.7.0.jar
-rw-r--r-- 1 victoria victoria    9446305 Jan 11 12:04 stanford-corenlp-3.9.2.jar
-rw-r--r-- 1 victoria victoria  362594065 Jul  7  2017 stanford-corenlp-models-current.jar
-rw-r--r-- 1 victoria victoria 1039009129 Jul  7  2017 stanford-english-corenlp-models-current.jar
-rw-r--r-- 1 victoria victoria  474001837 Jul  7  2017 stanford-english-kbp-corenlp-models-current.jar
drwxr-xr-x 2 victoria victoria      36864 Dec 10 11:54 surefire-reports
drwxr-xr-x 3 victoria victoria       4096 Jul  7  2017 test-classes

[victoria@victoria target]$ echo $CORENLP_HOME/
    /mnt/Vancouver/apps/CoreNLP/target/

[victoria@victoria target]$ java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators tokenize,ssplit,pos,lemma,ner \
-ner.additional.regexner.mapping custom_entities.tsv \
-file input_sentences.txt \
-outputFormat text; \
cat input_sentences.txt.out

Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLP
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP

Document: ID=input_sentences.txt (2 sentences, 13 tokens)
Sentence #1 (7 tokens):
Victoria lives in Vancouver, Canada.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=LOCATION]
[Text=lives CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=18 CharacterOffsetEnd=27 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
[Text=, CharacterOffsetBegin=27 CharacterOffsetEnd=28 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=Canada CharacterOffsetBegin=29 CharacterOffsetEnd=35 PartOfSpeech=NNP Lemma=Canada NamedEntityTag=LOCATION]
[Text=. CharacterOffsetBegin=35 CharacterOffsetEnd=36 PartOfSpeech=. Lemma=. NamedEntityTag=O]
Sentence #2 (6 tokens):
She likes apples and bananas.
[Text=She CharacterOffsetBegin=37 CharacterOffsetEnd=40 PartOfSpeech=PRP Lemma=she NamedEntityTag=O]
[Text=likes CharacterOffsetBegin=41 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=like NamedEntityTag=O]
[Text=apples CharacterOffsetBegin=47 CharacterOffsetEnd=53 PartOfSpeech=NNS Lemma=apple NamedEntityTag=O]
[Text=and CharacterOffsetBegin=54 CharacterOffsetEnd=57 PartOfSpeech=CC Lemma=and NamedEntityTag=O]
[Text=bananas CharacterOffsetBegin=58 CharacterOffsetEnd=65 PartOfSpeech=NNS Lemma=banana NamedEntityTag=O]
[Text=. CharacterOffsetBegin=65 CharacterOffsetEnd=66 PartOfSpeech=. Lemma=. NamedEntityTag=O]

[victoria@victoria target]$ java -cp "*" -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators tokenize,ssplit,pos,lemma,ner \
-ner.additional.regexner.mapping custom_entities.tsv \
-file input_sentences.txt \
-outputFormat text; \
cat input_sentences.txt.out

Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.4 sec].

Processing file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... writing to /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt.out
Annotating file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... done [0.2 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.0 sec.
MorphaAnnotator: 0.1 sec.
NERCombinerAnnotator: 0.0 sec.
TOTAL: 0.2 sec. for 13 tokens at 67.7 tokens/sec.
Pipeline setup: 2.7 sec.
Total time for StanfordCoreNLP pipeline: 2.9 sec.

Document: ID=input_sentences.txt (2 sentences, 13 tokens)
Sentence #1 (7 tokens):
Victoria lives in Vancouver, Canada.
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=LOCATION]
[Text=lives CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=18 CharacterOffsetEnd=27 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=LOCATION]
[Text=, CharacterOffsetBegin=27 CharacterOffsetEnd=28 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=Canada CharacterOffsetBegin=29 CharacterOffsetEnd=35 PartOfSpeech=NNP Lemma=Canada NamedEntityTag=LOCATION]
[Text=. CharacterOffsetBegin=35 CharacterOffsetEnd=36 PartOfSpeech=. Lemma=. NamedEntityTag=O]
Sentence #2 (6 tokens):
She likes apples and bananas.
[Text=She CharacterOffsetBegin=37 CharacterOffsetEnd=40 PartOfSpeech=PRP Lemma=she NamedEntityTag=O]
[Text=likes CharacterOffsetBegin=41 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=like NamedEntityTag=O]
[Text=apples CharacterOffsetBegin=47 CharacterOffsetEnd=53 PartOfSpeech=NNS Lemma=apple NamedEntityTag=O]
[Text=and CharacterOffsetBegin=54 CharacterOffsetEnd=57 PartOfSpeech=CC Lemma=and NamedEntityTag=O]
[Text=bananas CharacterOffsetBegin=58 CharacterOffsetEnd=65 PartOfSpeech=NNS Lemma=banana NamedEntityTag=O]
[Text=. CharacterOffsetBegin=65 CharacterOffsetEnd=66 PartOfSpeech=. Lemma=. NamedEntityTag=O]

[victoria@victoria target]$ 
victoriastuart commented 4 years ago

SOLUTION

OK, per @J38 's kind comments, this is solved! :-D

$ echo $CLASSPATH
$  ## blank

Per:

I appended the following to my $CLASSPATH.

$ export CLASSPATH="$CLASSPATH:/mnt/Vancouver/apps/CoreNLP/target/stanford-corenlp-3.9.2.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-corenlp-models-current.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-english-corenlp-models-current.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-english-kbp-corenlp-models-current.jar";

$ for file in `find /mnt/Vancouver/apps/CoreNLP/lib/ -name "*.jar"`; do export CLASSPATH="$CLASSPATH:`realpath $file`"; done

$ echo $CLASSPATH

:/mnt/Vancouver/apps/CoreNLP/target/stanford-corenlp-3.9.2.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-corenlp-models-current.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-english-corenlp-models-current.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-english-kbp-corenlp-models-current.jar:/mnt/Vancouver/apps/CoreNLP/lib/jaxb-api-2.4.0-b180830.0359.jar:/mnt/Vancouver/apps/CoreNLP/lib/jollyday-0.4.9.jar:/mnt/Vancouver/apps/CoreNLP/lib/commons-logging.jar:/mnt/Vancouver/apps/CoreNLP/lib/tomcat/jasper-el.jar:/mnt/Vancouver/apps/CoreNLP/lib/tomcat/jsp-api.jar:/mnt/Vancouver/apps/CoreNLP/lib/tomcat/tomcat-api.jar:/mnt/Vancouver/apps/CoreNLP/lib/tomcat/jasper.jar:/mnt/Vancouver/apps/CoreNLP/lib/tomcat/el-api.jar:/mnt/Vancouver/apps/CoreNLP/lib/tomcat/tomcat-juli.jar:/mnt/Vancouver/apps/CoreNLP/lib/ejml-ddense-0.38.jar:/mnt/Vancouver/apps/CoreNLP/lib/ejml-simple-0.38.jar:/mnt/Vancouver/apps/CoreNLP/lib/ant-contrib-1.0b3.jar:/mnt/Vancouver/apps/CoreNLP/lib/jaxb-impl-2.4.0-b180830.0438.jar:/mnt/Vancouver/apps/CoreNLP/lib/jaxb-core-2.3.0.1.jar:/mnt/Vancouver/apps/CoreNLP/lib/jflex-1.6.1.jar:/mnt/Vancouver/apps/CoreNLP/lib/lucene-core-7.5.0.jar:/mnt/Vancouver/apps/CoreNLP/lib/lucene-analyzers-common-7.5.0.jar:/mnt/Vancouver/apps/CoreNLP/lib/junit.jar:/mnt/Vancouver/apps/CoreNLP/lib/joda-time.jar:/mnt/Vancouver/apps/CoreNLP/lib/protobuf.jar:/mnt/Vancouver/apps/CoreNLP/lib/javax.servlet.jar:/mnt/Vancouver/apps/CoreNLP/lib/javacc.jar:/mnt/Vancouver/apps/CoreNLP/lib/ejml-core-0.38.jar:/mnt/Vancouver/apps/CoreNLP/lib/lucene-queryparser-7.5.0.jar:/mnt/Vancouver/apps/CoreNLP/lib/xom-1.3.2.jar:/mnt/Vancouver/apps/CoreNLP/lib/AppleJavaExtensions.jar:/mnt/Vancouver/apps/CoreNLP/lib/javax.activation-api-1.2.0.jar:/mnt/Vancouver/apps/CoreNLP/lib/lucene-demo-7.5.0.jar:/mnt/Vancouver/apps/CoreNLP/lib/javax.json.jar:/mnt/Vancouver/apps/CoreNLP/lib/log4j-1.2.16.jar:/mnt/Vancouver/apps/CoreNLP/lib/commons-lang3-3.1.jar:/mnt/Vancouver/apps/CoreNLP/lib/slf4j-simple.jar:/mnt/Vancouver/apps/CoreNLP/lib/appbundler-1.0.jar:/mnt/Vancouver/apps/CoreNLP/lib/slf4j-api.jar

To better follow the annotations, I updated my input test sentences and my RegexNER rules.

$ cat input_sentences.txt 
Victoria lives in Vancouver, Canada. She was born in Nova Scotia. Victoria likes apples and bananas.

$ cat custom_entities.tsv
Victoria    PERSON  LOCATION,ORGANIZATION,CITY  2
Vancouver   CITY    LOCATION,ORGANIZATION   2
Canada  COUNTRY LOCATION,ORGANIZATION,CITY  2
apple(s)    FRUIT       2
banana(s)   FRUIT       2

Correct output!

$ java -Xmx16g edu.stanford.nlp.pipeline.StanfordCoreNLP \
-annotators tokenize,ssplit,pos,lemma,ner \
-ner.additional.regexner.mapping custom_entities.tsv \
-file input_sentences.txt \
-outputFormat text; \
cat input_sentences.txt.out

[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [0.9 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.5 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585574 unique entries from 2 files
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.additional.regexner: Read 5 unique entries out of 5 from custom_entities.tsv, 0 TokensRegex patterns.

Processing file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... writing to /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt.out
Annotating file /mnt/Vancouver/apps/CoreNLP/target/input_sentences.txt ... done [0.4 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.0 sec.
MorphaAnnotator: 0.0 sec.
NERCombinerAnnotator: 0.2 sec.
TOTAL: 0.4 sec. for 20 tokens at 54.8 tokens/sec.
Pipeline setup: 8.5 sec.
Total time for StanfordCoreNLP pipeline: 9.1 sec.

Document: ID=input_sentences.txt (3 sentences, 20 tokens)
Sentence #1 (7 tokens):
Victoria lives in Vancouver, Canada.

Tokens:
[Text=Victoria CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=PERSON]
[Text=lives CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=VBZ Lemma=live NamedEntityTag=O]
[Text=in CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Vancouver CharacterOffsetBegin=18 CharacterOffsetEnd=27 PartOfSpeech=NNP Lemma=Vancouver NamedEntityTag=CITY]
[Text=, CharacterOffsetBegin=27 CharacterOffsetEnd=28 PartOfSpeech=, Lemma=, NamedEntityTag=O]
[Text=Canada CharacterOffsetBegin=29 CharacterOffsetEnd=35 PartOfSpeech=NNP Lemma=Canada NamedEntityTag=COUNTRY]
[Text=. CharacterOffsetBegin=35 CharacterOffsetEnd=36 PartOfSpeech=. Lemma=. NamedEntityTag=O]

Extracted the following NER entity mentions:
Victoria    PERSON  LOCATION:0.6059370876590606
Vancouver   CITY    LOCATION:0.9921788688695864
Canada  COUNTRY LOCATION:0.9992413208111567
Sentence #2 (7 tokens):
She was born in Nova Scotia.

Tokens:
[Text=She CharacterOffsetBegin=37 CharacterOffsetEnd=40 PartOfSpeech=PRP Lemma=she NamedEntityTag=O]
[Text=was CharacterOffsetBegin=41 CharacterOffsetEnd=44 PartOfSpeech=VBD Lemma=be NamedEntityTag=O]
[Text=born CharacterOffsetBegin=45 CharacterOffsetEnd=49 PartOfSpeech=VBN Lemma=bear NamedEntityTag=O]
[Text=in CharacterOffsetBegin=50 CharacterOffsetEnd=52 PartOfSpeech=IN Lemma=in NamedEntityTag=O]
[Text=Nova CharacterOffsetBegin=53 CharacterOffsetEnd=57 PartOfSpeech=NNP Lemma=Nova NamedEntityTag=STATE_OR_PROVINCE]
[Text=Scotia CharacterOffsetBegin=58 CharacterOffsetEnd=64 PartOfSpeech=NNP Lemma=Scotia NamedEntityTag=STATE_OR_PROVINCE]
[Text=. CharacterOffsetBegin=64 CharacterOffsetEnd=65 PartOfSpeech=. Lemma=. NamedEntityTag=O]

Extracted the following NER entity mentions:
Nova Scotia STATE_OR_PROVINCE   LOCATION:0.9944154320168771
She PERSON  -
Sentence #3 (6 tokens):
Victoria likes apples and bananas.

Tokens:
[Text=Victoria CharacterOffsetBegin=66 CharacterOffsetEnd=74 PartOfSpeech=NNP Lemma=Victoria NamedEntityTag=PERSON]
[Text=likes CharacterOffsetBegin=75 CharacterOffsetEnd=80 PartOfSpeech=VBZ Lemma=like NamedEntityTag=O]
[Text=apples CharacterOffsetBegin=81 CharacterOffsetEnd=87 PartOfSpeech=NNS Lemma=apple NamedEntityTag=FRUIT]
[Text=and CharacterOffsetBegin=88 CharacterOffsetEnd=91 PartOfSpeech=CC Lemma=and NamedEntityTag=O]
[Text=bananas CharacterOffsetBegin=92 CharacterOffsetEnd=99 PartOfSpeech=NNS Lemma=banana NamedEntityTag=FRUIT]
[Text=. CharacterOffsetBegin=99 CharacterOffsetEnd=100 PartOfSpeech=. Lemma=. NamedEntityTag=O]

Extracted the following NER entity mentions:
Victoria    PERSON  PERSON:0.5045879288466439
apples  FRUIT   -
bananas FRUIT   -

$ 
J38 commented 4 years ago

Ok how about we try these commands and see if that works.

The first sets the CLASSPATH environment variable, the next is just for showing that worked, then since it appears the relevant files are in /mnt/Vancouver/apps/CoreNLP/target you should cd into that directory and run the java command.

Assuming /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05 is an unaltered download of the 3.9.2 distribution folder, things should work properly.

Please let me know if there are any issues and I can help you troubleshoot more.

export CLASSPATH=/mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/*:
echo $CLASSPATH
cd /mnt/Vancouver/apps/CoreNLP/target
java -Xmx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.regexner.mapping custom_entities.tsv -file input_sentences.txt -outputFormat text
J38 commented 4 years ago

Oh wait, sorry, I guess it looks like you've got it working!

J38 commented 4 years ago

Any rate for the time being I would recommend working with the official 3.9.2 release, since master of Stanford CoreNLP is a bit messy...we are going to release 4.0.0 over the next few weeks.

victoriastuart commented 4 years ago

Yes: working now! I'll mark this Issue as closed.

Thank you once again, @J38 , for your patient help -- very much appreciated! :+1:


Edit: added to ~/.bashrc:

## https://stanfordnlp.github.io/CoreNLP/download.html#steps-to-setup-from-the-github-head-version
## Since the following lines will duplicate / add all of the $CLASSPATH information
## every time I `exec bash` the terminal, I first explicitly clear that PATH.
## Alternatively, add to `~/.profile` as described here:
##   https://stackoverflow.com/questions/13830594/when-i-execute-bash-the-path-keeps-repeating-itself

export CLASSPATH=""

export CLASSPATH="$CLASSPATH:/mnt/Vancouver/apps/CoreNLP/target/stanford-corenlp-3.9.2.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-corenlp-models-current.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-english-corenlp-models-current.jar:/mnt/Vancouver/apps/CoreNLP/models/stanford-english-kbp-corenlp-models-current.jar";

for file in `find /mnt/Vancouver/apps/CoreNLP/lib/ -name "*.jar"`; do export CLASSPATH="$CLASSPATH:`realpath $file`"; done