winkjs / wink-eng-lite-web-model

English lite language model for Web Browsers
MIT License
11 stars 9 forks source link

POS-tagging state is not cleared between documents #18

Closed rene-leanix closed 2 days ago

rene-leanix commented 6 days ago

I stumbled upon a sentence that is parsed inconsistently, depending on whether another text is or is not read on first with nlp.readDoc():

const sentence1 = 'The students did not complete their homework, Nor did they pass the test.';
const sentence2 = 'Romeo And Juliet.';

const nlp = winkNLP(model);
const pos1 = nlp.readDoc(sentence1).tokens().itemAt(8).out(nlp.its.pos); // CCONJ

const nlp2 = winkNLP(model);
nlp2.readDoc(sentence2);
const pos2 = nlp2.readDoc(sentence1).tokens().itemAt(8).out(nlp2.its.pos); // PROPN

The part-of-speech tag of the word "Nor" (which is capitalised in this string as a test case to check of the code correctly lower-cases this conjunction), is tagged as a coordinating conjunction in the first example, but as a proper noun in the second. I am not so much concerned about the wrong tag in the second run, as the spelling "Nor" makes it look like a name, but about the inconsistency between the two runs.

It looks like the reading of sentence 2 with the two proper nouns "Romeo" and "Juliet" sets a flag that causes the "Nor" in the other sentence to be also regarded as a proper noun.

rachnachakraborty commented 5 days ago

Hi @rene-leanix,

Thanks for highlighting the issue in detail.

We'll dig deeper.

Thanks, Rachna