tatuylonen / wikitextprocessor

Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
Other
90 stars 23 forks source link

Undeclared variable assignments from modules without require ('strict'); #258

Closed olsaarik closed 1 month ago

olsaarik commented 4 months ago

I'm processing a recent English Wikipedia dump and getting assign to undeclared variable errors from modules that don't have a require ('strict'); in them. Here's stripped down code to replicate this:

from wikitextprocessor import Wtp
wtp = Wtp(db_path="enwiki-20240201-wtp.db", lang_code="en", project="wikipedia")
process_dump(wtp, "enwiki-20240201-pages-articles-multistream.xml.bz2", {0, 10, 828}) # main, Template and Module namespaces
for page in wtp.get_all_pages():
    if page.redirect_to != None:
        continue
    wtp.start_page(page.title)
    wtp.expand(page.body)
    break

And here's the errors I'm seeing:

Anarchism: ERROR: LUA error in #invoke('Protection banner', 'main') parent ('Template:Pp-semi-indef', {}) at ['Anarchism', 'pp-semi-indef', '#invoke', '#invoke']
[string "Module:Effective protection level"]:63: attempt to index field 'TitleBlacklist' (a nil value)
Anarchism: WARNING: invalid attribute format '20x20px\n ' missing name at ['Anarchism', 'good article', 'Main other', 'ARGVAL-1', 'Top icon', '#tag', '#tag']
Anarchism: WARNING: invalid attribute format '\n ' missing name at ['Anarchism', 'good article', 'Main other', 'ARGVAL-1', 'Top icon', '#tag', '#tag']
Anarchism: WARNING: invalid attribute format 'This is a good article. Click here for more information.]]' missing name at ['Anarchism', 'good article', 'Main other', 'ARGVAL-1', 'Top icon', '#tag', '#tag']
Anarchism: ERROR: LUA error in #invoke('sidebar', 'sidebar', ' child = yes', ' contentclass = hlist\n', ' heading1 =', ' content1 =\n<section begin=Schools of thought />\n* [[Anarcha-feminism|Feminist]]\n* [[Green anarchism|Green]]\n** [[Anarcho-primitivism|Primitivist]]\n** [[Social ecology (Bookchin)|Social ecology]]\n** [[Total liberation]]\n* [[Individualist anarchism|Individualist]]\n** [[Egoist anarchism|Egoist]]\n** [[Market anarchism|Free-market]]\n** [[Anarcho-naturism|Naturist]]\n** [[Philosophical anarchism|Philosophical]]\n* [[Mutualism (economic theory)|Mutualism]]\n* [[Postcolonial anarchism|Postcolonial]]\n** [[African anarchism|African]]\n** [[Black anarchism|Black]]\n* [[Queer anarchism|Queer]]\n* [[Anarchism and religion|Religious]]\n** [[Christian anarchism|Christian]]\n** [[Jewish anarchism|Jewish]]\n* [[Social anarchism|Social]]\n** [[Collectivist anarchism|Collectivist]]\n*** [[Parecon]]\n** [[Anarcho-communism|Communist]]\n*** [[Magonism]]\n* [[Anarchism without adjectives|Without adjectives]]\n<section end=Schools of thought />\n', ' heading5 = Methodology', ' content5 =\n<section begin=Methodology />\n* [[Agorism]]\n* [[Illegalism]]\n* [[Insurrectionary anarchism|Insurrectionary]]\n** [[Communization]]\n** [[Expropriative anarchism|Expropriative]]\n* [[Anarcho-pacifism|Pacifist]]\n* [[Platformism]]\n** [[Especifismo]]\n* [[Relationship anarchy|Relationship]]\n* [[Anarcho-syndicalism|Syndicalist]]\n* [[Synthesis anarchism|Synthesis]]\n<section end=Methodology />') parent ('Template:Anarchism sidebar', {}) at ['Anarchism', 'anarchism sidebar', '#invoke', '#invoke', 'Lua:sidebar:collapsible()', 'frame:preprocess()', '#invoke', '#invoke']
Traceback (most recent call last):
  File "path-to-site-packages/wikitextprocessor/luaexec.py", line 684, in call_lua_sandbox
    ret: tuple[bool, str] = ctx.lua_invoke(
                            ^^^^^^^^^^^^^^^
  File "lupa/lua51.pyx", line 869, in lupa.lua51._LuaObject.__call__
  File "lupa/lua51.pyx", line 1835, in lupa.lua51.call_lua
  File "lupa/lua51.pyx", line 1861, in lupa.lua51.execute_lua_call
  File "lupa/lua51.pyx", line 1743, in lupa.lua51.raise_lua_error
lupa.lua51.LuaError: [string "<python>"]:36: assign to undeclared variable 'string'
stack traceback:
    [C]: in function 'error'
    [string "strict"]:21: in function <[string "strict"]:19>
    [string "<python>"]:36: in function <[string "<python>"]:19>
    (tail call): ?
    [string "_sandbox_phase2"]:142: in function <[string "_sandbox_phase2"]:121>
    [C]: in function 'preprocess'
    [string "_sandbox_phase2"]:23: in function <[string "_sandbox_phase2"]:11>
    [string "_sandbox_phase2"]:36: in function <[string "_sandbox_phase2"]:32>
    (tail call): ?
    [string "Module:Arguments"]:207: in function 'mergeArgs'
    [string "Module:Arguments"]:320: in function <[string "Module:Arguments"]:317>
    (tail call): ?
    [string "sidebar"]:412: in function <[string "sidebar"]:397>
    [C]: in function 'pcall'
    [string "_sandbox_phase2"]:172: in function <[string "_sandbox_phase2"]:121>
Anarchism: ERROR: LUA error in #invoke('list', 'horizontal') parent ('Template:Hlist', {1: '[[Global governance|Global]]', 2: '[[Local government|Local]]'}) at ['Anarchism', 'basic forms of government', 'Politics series sidebar', 'ARGVAL-list2', '#invoke', '#invoke', 'Lua:sidebar:sidebar()', 'frame:preprocess()', 'hlist', '#invoke', '#invoke']
Traceback (most recent call last):
  File "path-to-site-packages/wikitextprocessor/luaexec.py", line 684, in call_lua_sandbox
    ret: tuple[bool, str] = ctx.lua_invoke(
                            ^^^^^^^^^^^^^^^
  File "lupa/lua51.pyx", line 869, in lupa.lua51._LuaObject.__call__
  File "lupa/lua51.pyx", line 1835, in lupa.lua51.call_lua
  File "lupa/lua51.pyx", line 1861, in lupa.lua51.execute_lua_call
  File "lupa/lua51.pyx", line 1743, in lupa.lua51.raise_lua_error
lupa.lua51.LuaError: [string "<python>"]:36: assign to undeclared variable 'string'
stack traceback:
    [C]: in function 'error'
    [string "strict"]:21: in function <[string "strict"]:19>
    [string "<python>"]:36: in function <[string "<python>"]:19>
    (tail call): ?
    [string "_sandbox_phase2"]:142: in function <[string "_sandbox_phase2"]:121>
    [C]: in function 'preprocess'
    [string "_sandbox_phase2"]:23: in function <[string "_sandbox_phase2"]:11>
    [string "_sandbox_phase2"]:36: in function <[string "_sandbox_phase2"]:32>
    (tail call): ?
    [string "Module:Arguments"]:207: in function 'mergeArgs'
    [string "Module:Arguments"]:320: in function <[string "Module:Arguments"]:317>
    (tail call): ?
    [string "sidebar"]:122: in function 'move_hiding_templatestyles'
    [string "sidebar"]:140: in function <[string "sidebar"]:136>
    [C]: in function 'pcall'
    [string "_sandbox_phase2"]:172: in function <[string "_sandbox_phase2"]:121>
Anarchism: ERROR: LUA error in #invoke('lang', 'lang_xx_italic', 'code=fr') parent ('Template:Lang-fr', {1: 'anarchiste'}) at ['Anarchism', 'lang-fr', '#invoke', '#invoke']
    Loading module failed in #invoke: lang
[string "Module:Lang/data"]:647: variable 'special_tags_table' is not declared
Anarchism: ERROR: LUA error in #invoke('Lang', 'lang') parent ('Template:Lang', {1: 'fr', 2: '[[sans-culottes]]'}) at ['Anarchism', 'lang', '#invoke', '#invoke']
    Loading module failed in #invoke: Lang
[string "Module:Lang/data"]:647: variable 'special_tags_table' is not declared
Anarchism: ERROR: LUA error in #invoke('citation/CS1', 'citation', 'CitationClass=book') parent ('Template:Cite book', {'title': 'The Desk Encyclopedia of World History', 'publisher': '[[Oxford University Press]]', 'year': '2006', 'isbn': '978-0-7394-7809-7', 'editor-last': 'Wright', 'editor-first': 'Edmund', 'location': 'New York', 'pages': '20–21'}) at ['Anarchism', 'Cite book', '#invoke', '#invoke']
[string "Module:Citation/CS1/Configuration"]:33: assign to undeclared variable 'uncategorized_namespaces_t'
Anarchism: ERROR: LUA error in #invoke('citation/CS1', 'citation', 'CitationClass=citation') parent ('Template:Citation', {'last': 'Fiala', 'first': 'Andrew', 'title': 'Anarchism', 'date': '2021', 'url': 'https://plato.stanford.edu/archives/win2021/entries/anarchism/', 'encyclopedia': 'The Stanford Encyclopedia of Philosophy', 'editor-last': 'Zalta', 'editor-first': 'Edward N.', 'access-date': '2023-06-17', 'edition': 'Winter 2021', 'publisher': 'Metaphysics Research Lab, Stanford University'}) at ['Anarchism', 'Citation', '#invoke', '#invoke']
[string "Module:Citation/CS1/Configuration"]:33: assign to undeclared variable 'uncategorized_namespaces_t'
Anarchism: ERROR: LUA error in #invoke('citation/CS1', 'citation', 'CitationClass=book') parent ('Template:Cite book', {'last': 'Bakunin', 'first': 'Mikhail', 'author-link': 'Mikhail Bakunin', 'title': 'Statism and Anarchy', 'title-link': 'Statism and Anarchy', 'year': '1990', 'orig-year': '1873', 'publisher': '[[Cambridge University Press]]', 'location': 'Cambridge, England', 'series': 'Cambridge Texts in the History of Political Thought', 'translator-last': 'Shatz', 'translator-first': 'Marshall', 'isbn': '978-0-521-36182-8', 'oclc': '20826465', 'lccn': '89077393', 'doi': '10.1017/CBO9781139168083', 'editor1-last': 'Shatz', 'editor1-first': 'Marshall'}) at ['Anarchism', 'cite book', '#invoke', '#invoke']
[string "Module:Citation/CS1/Configuration"]:33: assign to undeclared variable 'uncategorized_namespaces_t'
...
Snip, many repeats.
...
Anarchism: ERROR: LUA error in #invoke('citation/CS1', 'citation', 'CitationClass=book') parent ('Template:Cite book', {'last1': 'Levy', 'first1': 'Carl', 'last2': 'Adams', 'first2': 'Matthew S.', 'title': 'The Palgrave Handbook of Anarchism', 'date': '2019', 'publisher': '[[Palgrave Macmillan]]', 'doi': '10.1007/978-3-319-75620-2', 'isbn': '978-3-319-75620-2', 's2cid': '149333615', 'url': 'https://link.springer.com/book/10.1007/978-3-319-75620-2', 'language': 'en'}) at ['Anarchism', 'cite book', '#invoke', '#invoke']
[string "Module:Citation/CS1/Configuration"]:33: assign to undeclared variable 'uncategorized_namespaces_t'
Anarchism: ERROR: LUA error in #invoke('If empty', 'main') parent ('Template:If empty', {1: '', 2: '[[List of anarchist communities|Anarchist-related territories and autonomous zones]]'}) at ['Anarchism', 'anarchies', '#invoke', '#invoke', 'Lua:navbox:navbox()', 'frame:preprocess()', 'if empty', '#invoke', '#invoke']
Traceback (most recent call last):
  File "path-to-site-packages/wikitextprocessor/luaexec.py", line 684, in call_lua_sandbox
    ret: tuple[bool, str] = ctx.lua_invoke(
                            ^^^^^^^^^^^^^^^
  File "lupa/lua51.pyx", line 869, in lupa.lua51._LuaObject.__call__
  File "lupa/lua51.pyx", line 1835, in lupa.lua51.call_lua
  File "lupa/lua51.pyx", line 1861, in lupa.lua51.execute_lua_call
  File "lupa/lua51.pyx", line 1743, in lupa.lua51.raise_lua_error
lupa.lua51.LuaError: [string "<python>"]:36: assign to undeclared variable 'string'
stack traceback:
    [C]: in function 'error'
    [string "strict"]:21: in function <[string "strict"]:19>
    [string "<python>"]:36: in function <[string "<python>"]:19>
    (tail call): ?
    [string "_sandbox_phase2"]:142: in function <[string "_sandbox_phase2"]:121>
    [C]: in function 'preprocess'
    [string "_sandbox_phase2"]:23: in function <[string "_sandbox_phase2"]:11>
    [string "Module:Arguments"]:254: in function <[string "Module:Arguments"]:232>
    [string "navbox"]:552: in function <[string "navbox"]:543>
    [C]: in function 'pcall'
    [string "_sandbox_phase2"]:172: in function <[string "_sandbox_phase2"]:121>

I looked into one of these, namely the one for Citation/CS1/Configuration, since there were many of them. The errors are like:

Anarchism: ERROR: LUA error in #invoke('citation/CS1', 'citation', 'CitationClass=book') parent ('Template:Cite book', {'title': 'The Desk Encyclopedia of World History', 'publisher': '[[Oxford University Press]]', 'year': '2006', 'isbn': '978-0-7394-7809-7', 'editor-last': 'Wright', 'editor-first': 'Edmund', 'location': 'New York', 'pages': '20–21'}) at ['Anarchism', 'Cite book', '#invoke', '#invoke']
[string "Module:Citation/CS1/Configuration"]:33: assign to undeclared variable 'uncategorized_namespaces_t'

My guess is the problem comes from a require ('strict'); appearing in the importing module, https://en.wikipedia.org/wiki/Module:Citation/CS1, which loads https://en.wikipedia.org/wiki/Module:Citation/CS1/Configuration with:

cfg = mw.loadData ('Module:Citation/CS1/Configuration' .. sandbox);

It seems the sandbox re-implements loadData here https://github.com/tatuylonen/wikitextprocessor/blob/main/src/wikitextprocessor/lua/_sandbox_phase1.lua#L129, which calls into new_loader. My understanding of Lua is limited, but it seems like new_loader might not implement the same logic for loading the module in a new env as executeModule in https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/8d69dc173e33ae936ff4401d41ee5e6a1fd1ba67/includes/Engines/LuaCommon/lualib/mw.lua#L467 does.

xxyzz commented 4 months ago

Duplicate of https://github.com/tatuylonen/wikitextprocessor/issues/90#issuecomment-1700575679

xxyzz commented 3 months ago

This Lua error should be fixed by the above linked pr, thank you for the detailed bug report!