stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.68k stars 2.7k forks source link

Using DepParse in conjunction with Quote in 3.9.x #799

Open aendra-rininsland opened 5 years ago

aendra-rininsland commented 5 years ago

I'm using poorna-kumar/gendermeme-core in conjunction with CoreNLP and since upgrading to the latest version I am unable to make anything work due to the change to the Quote annotator.

The entirety of the annotator loadout I need is

['pos', 'lemma', 'ner', 'parse', 'depparse', 'dcoref', 'quote', 'openie']

(Which given that the first few get pulled in as deps can be simplified to ['depparse', 'dcoref', 'coref', 'openie', 'quote']) ...but it seems the Quote and DepParse annotators are wholly incompatible.

I suppose I can downgrade to 3.8.x but I'd rather not if possible... Any suggestions? Thanks!

J38 commented 5 years ago

Could you elaborate on what kind of error you are seeing...depparse should definitely work fine with quote.

J38 commented 5 years ago

For instance this pipeline should work fine:

tokenize,ssplit,pos,lemma,ner,depparse,coref,natlog,openie,quote
sillystring13 commented 5 years ago

I think that the Quote Annotator is forcing the default depparse model here: https://github.com/stanfordnlp/CoreNLP/blob/eb43d5d9150de97f8061fa06b838f1d021586789/src/edu/stanford/nlp/quoteattribution/QuoteAttributionUtils.java#L223

I can't see anywhere in the code that it's checking the depparse.model property, so if @aendrew has a custom depparse in the pipeline it might conflict.

sillystring13 commented 5 years ago

I think that the properties for QuoteAttributionAnnotator are getting clobbered in the current flow. Here's what I've traced through

AnnotationImplementor passes properties quote.* when it creates QuoteAnnotator https://github.com/stanfordnlp/CoreNLP/blob/eb43d5d9150de97f8061fa06b838f1d021586789/src/edu/stanford/nlp/pipeline/AnnotatorImplementations.java#L264-L268

QuoteAnnotator creates the QuoteAttributionAnnotator using the truncated properties https://github.com/stanfordnlp/CoreNLP/blob/eb43d5d9150de97f8061fa06b838f1d021586789/src/edu/stanford/nlp/pipeline/QuoteAnnotator.java#L171

As a result of the truncation by AnnotatorImplementations, the properties quoteattribution.* never get passed to the QuoteAttributionAnnotator, so if any of your supporting files (model, familyWordsFile, animacyWordsFile, genderNamesFile) aren't on the default path the pipeline won't load.

Additionally, the QuoteAttributionAnnotator uses QuoteAttributionUtils, which loads DependencyParser from the hard-coded default. The pipeline load will fail if your dependency parser is not on the default path.

Do y'all have a preference for how to pass-through the properties for quoteattribution. and depparse.. The easy way is to pass them through in the properties when QuoteAnnotator gets created, then use them as needed. The pass-through approach requires a little juggling with QuoteAttributionUtils due to the call sequence, but probably wouldn't be too ugly.

Call sequences for the depparse issue

  1. QAA.annotate -> QAU.addEnhancedSentences -> QAU.constructSentence -> QAU.getParse
  2. QAA.annotate -> QAU.annotateForDependencyParse -> QAU.getParse

Alternatively, there could be a shuffle for how QuoteAttributionAnnotator get created, but that is probably an architecture question.

aendra-rininsland commented 5 years ago

@sillystring13 I don't have a custom depparse (At least as far as I know?), fwiw!

@J38 Err I totally forget what the actual error was, apologies, that's terrible issue reporting on my behalf. I'll try the order you suggested next opportunity I get. :+1: