tatuylonen / wikitextprocessor

Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
Other
92 stars 23 forks source link

Checklist-1 for existing errors. #226

Closed LeMoussel closed 5 months ago

LeMoussel commented 6 months ago

Attached CSV file: wiki_errors.csv, listing, by Wikipedia article title, other errors than those indicated in the issues #225, #224, #223, #220 & #216:

In summary, there are the following errors:

kristian-clausal commented 5 months ago

For some reason I had to use pip install --force-reinstall --upgrade lupa, don't know if the --upgrade is necessary in this case.

LeMoussel commented 5 months ago

It's OK with updating Lupa package with pip install --force-reinstall --upgrade lupa I'm taking the tests again :relaxed:

kristian-clausal commented 5 months ago

The issue was (as explained by xxyzz to me) that when we used to roll our own 'compilation' of lupa instead of using the pip version (which lacked some things we needed at the time), the version number that we installed it under in pip was 2.1, which is the same as what was used for the big version that was now updated on pypi; so when the pip installed saw that there was already a 2.1 on the system, it would skip the upgrade.

LeMoussel commented 5 months ago

260 and #262 should fix the Lua errors in page Ford.

I confirm that these errors are corrected. Good job !

LeMoussel commented 5 months ago

Summary of remaining LUA errors

Droit: ERROR: LUA error in #invoke('Excerpt', 'main', ' only = \U00102187', ' files = ', ' lists = ', ' templates = ', ' paragraphs = ', ' references = ', ' subsections = ', ' bold = ', ' more = ', ' hat = ', ' this = ', ' quote = ', ' inline = ') parent ('Modèle:Extrait', {1: 'Branches du droit'}) at ['Droit', 'Extrait', '#invoke', '#invoke'] [string "Module:TNT"]:190: Invalid message key "error_bad_msgkey"


These 2 errors are caused by these templates
`{{Extrait|Positivisme juridique}}`
`{{Extrait|Branches du droit}}`

- [x] [Ford](https://fr.wikipedia.org/wiki/Ford) OK - No error/warning
Remove Template `Liste des dirigeants`
`{{Liste des dirigeants successifs|types=directeur général|titre=[[Directeurs généraux]] (CEO)|portrait=oui}}`

- [x] [Fonds monétaire international](https://fr.wikipedia.org/wiki/Fonds_mon%C3%A9taire_international) OK - No error/warning
With the removal of certain templates
- [x] [Élection présidentielle française de 1965](https://fr.wikipedia.org/wiki/%C3%89lection_pr%C3%A9sidentielle_fran%C3%A7aise_de_1965) OK - No error/warning
With the removal of certain templates
- [x] [Élection présidentielle française de 1969](https://fr.wikipedia.org/wiki/%C3%89lection_pr%C3%A9sidentielle_fran%C3%A7aise_de_1969) OK - No error/warning
With the removal of certain templates
- [x] [Festival de Cannes](https://fr.wikipedia.org/wiki/Festival_de_Cannes) OK - No error/warning
With the removal of certain templates
xxyzz commented 5 months ago

You didn't post the actually error message so I could only guess the error might be the new "entity_data" column added to the "wikidata_items" table. You could drop the table and try again. I tested the "Ford" page and didn't get any error.

LeMoussel commented 5 months ago
Ford: ERROR: LUA error in #invoke('Titulaires', 'tableauDesDirigeants') parent ('Modèle:Liste des dirigeants successifs', {'types': 'directeur général', 'titre': '[[Directeurs généraux]] (CEO)', 'portrait': 'oui'}) at ['Ford', 'Liste des dirigeants successifs', '#invoke', '#invoke']
table wikidata_items has no column named entity_data
Ford: ERROR: LUA error in #invoke('Titulaires', 'tableauDesDirigeants') parent ('Modèle:Liste des dirigeants successifs', {'types': "membre du conseil d'administration", 'titre': "Membres du [[conseil d'administration]]", 'portrait': 'oui'}) at ['Ford', 'liste des dirigeants successifs', '#invoke', '#invoke']
table wikidata_items has no column named entity_data

The error is caused by {{Liste des dirigeants successifs|types=directeur général|titre=[[Directeurs généraux]] (CEO)|portrait=oui}} I remove this template like this:

def clean_node_handler(node) -> Optional[str]:
    if node.kind == NodeKind.TEMPLATE:
        if node.template_name.lower().startswith(
            (
                "…",
                "confusion",
                "infobox",
                "semi-protection",
                "coord",
                "portail",
                "voir homonymes",
                "sommaire",
                "carte communes limitrophes",
                "climat",
                "article",
                "référence",
                "section vide",
                "population de france",
                "pyramide des âges",
                "autres projets",
                "autre4",
                "voir homonymes",
                "liste des dirigeants successifs",
            )
        ):
            return ""
        else:
            return None

And so there are no more errors

LeMoussel commented 5 months ago

With the removal of certain templates, there are no more errors on these pages. Consequently, I propose to close this issue.

kristian-clausal commented 5 months ago

The issues aren't completely fixed, but this thread is a bit of grab bag and has probably served its purpose, yeah. Just make new issues when something pops up :+1: