pierre-24 / pyiso4

Implementation of the ISO 4 standard for journal titles abbreviations in Python.
MIT License
4 stars 2 forks source link

Fix missed detection of single letter part names; Close #13 #15

Closed klb2 closed 2 months ago

klb2 commented 2 months ago

Fix the wrong classification of single letter part names, if the single letter is also a stopword.

Since "E" and "I" are in stopwords.txt, they were classified as TokenType.STOPWORD by the lexer, if not preceded by a word that is in PARTS. However, as described in #13, this can be a wrong classification.

With this commit, the lexer now considers symbols like . to also by an indicator of parts. Additionally, we need to change IS_ORDINAL.match to IS_ORDINAL.fullmatch to avoid false positives. (With .match, any partial match is evaluated to true, i.e., every word that starts with a capital letter is evaluated to true)

pierre-24 commented 2 months ago

Sorry, I don't get why test do not launch themselves automatically. I probably need to put you in a list somewhere. However, as you can see, there is a mistake in your code.

Thank you for the fullmatch, I have to say that I did not know about that :)

pierre-24 commented 2 months ago

Maybe you actually simply need to rebase your branch now that #12 is merged, thought :)

klb2 commented 2 months ago

Yes, I think the issue was because #12 is merged. I have updated the fork and you should now be able to also merge this.

pierre-24 commented 2 months ago

Yep, thank you. I need to check one or two things, then I will to a release with all that :)