stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.71k stars 2.7k forks source link

Wrong POS for "keine": PRON instead of DET #1431

Open GeorgeS2019 opened 8 months ago

GeorgeS2019 commented 8 months ago

Ich habe keine Übungen gemacht, weil ich keine Lust habe.

Stanza states keine as DET CoreNLP 4.5.6 (with corresponding 4.5.6 German model) states keine as PRON

AngledLuffa commented 8 months ago

The data used to train the Stanza tagger was

ud-treebanks-v2.13/UD_German-GSD/de_gsd-ud-train.conllu

where keine is treated as DET

The CoreNLP tagger has not been retrained since UD 2.4, where the standard was to treat keine as PRON

Retraining taggers with updated data is less of a hassle than the general feature adds you've been requesting, so, we'll put updated data for some of those models on the list

GeorgeS2019 commented 8 months ago

@AngledLuffa

I have tried to connect to @manning through Linkedin regarding CoreNLP 4.5.6 with specific interest on German model 4.5.6

GeorgeS2019 commented 8 months ago

@AngledLuffa

I also have issue with the result of dependency parsing. Hopefully, this will go away when the German POS assignment is correct.

GeorgeS2019 commented 8 months ago

@AngledLuffa I am comparing the CoreNLP German output through code with that of Stanza. I understand that CoreNLP run online is no longer running. It will take extra few steps to compare between CoreNLP 4.5.6 and the latest Stanza.

AngledLuffa commented 8 months ago

I mean, you'd probably have better luck just running these things locally and looking at the results, but thank you for informing us of the demo program's demise. I have kicked it.

GeorgeS2019 commented 8 months ago

@AngledLuffa

Does german parser in CoreNLP support XPOS? I can ONLY find UPOS

CoreNLP

props.setProperty("annotators", "tokenize, ssplit, mwt, pos, lemma, ner, depparse");

Stanza

https://stanfordnlp.github.io/stanza/pos.html image

AngledLuffa commented 8 months ago

Correct, UPOS only