statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
211 stars 36 forks source link

Same token assigned different POS labels #82

Closed pchest closed 1 year ago

pchest commented 1 year ago

The cnlp_annotate function from the cleanNLP package assigns different tags to the same word: 'president' in the same R session at different times. The two tags in question are 'NN' and 'NNP'.

cleanNLP version: 3.0.4

R version: 4.3.0

Operating System: Pop!_OS 22.04

statsmaths commented 1 year ago

Do you mean that it's assigning different part of speech tags to the exact same word in a document when you re-run the cnlp_annotate function? Or, do you just mean that it's tagging "president" with different parts of speech in different sentences?

If it's the first, that would be surprising. Do you have a minimal working example with a short fragement you could share and could you indicate which backend your using? If it's the second, that's not surprising at all. A sentence such as "President of France" would likely "president" as NNP (a singual proper noun) and "the president said that..." as a NN (a singular non-proper noun).