mediawiki-utilities / python-mwcites

MIT License
38 stars 11 forks source link

ISBN regexp improvement #15

Closed Xarvalus closed 6 years ago

Xarvalus commented 6 years ago

An ISBN regexp improvement allowing to extract more polluted citations like 902 198 84 X, 1-57488-530-8, and {{ISBN|978-83-7435-239-0​}} ones.

In colleague's work - he reported to extract 30-40% percentage more valid citations after the fix.

(Also contains minor code cleanup and fixes)

kodchi commented 6 years ago

Do you have any example revisions where pipes are used? Would be nice to link to them in the commit message.

Xarvalus commented 6 years ago

This one was ours inspiration: Editing Dies irae (section).

Contains pipes in citations, similar form like these in the test code:

''Słownik muzyki'', red. Wojciech Marchwica, Wydawnictwo Zielona Sowa, Kraków 2006; **{{ISBN|83-7435-239-6}}**, **{{ISBN|978-83-7435-239-0}}**.. W [[Zwyczajna forma rytu rzymskiego|nowym obrządku]] nie jest już powszechnie używana (można ją wykonywać w ostatnim
Xarvalus commented 6 years ago

You are right https://regex101.com/r/uekqPD/1, I have corrected the regex. Leftover of try with alternative | matching.

kodchi commented 6 years ago

Thanks for the pull request!