Fix HTML regex pattern and add a test for `mw.text.decode()`

tatuylonen / wikitextprocessor

Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.

Other

90 stars 23 forks source link

Fix HTML regex pattern and add a test for `mw.text.decode()` #247

Closed xxyzz closed 4 months ago

xxyzz commented 4 months ago

Upper case X before HTML entity number is not supported in Scribunto, so we could only use the lower case x.

kristian-clausal commented 4 months ago

These changes are incorrect! a-fA-F is for hexadecimal numbers! The uppercase 'X' is taken directly from module code!