tatuylonen / wiktextract

Wiktionary dump file parser and multilingual data extractor
Other
799 stars 82 forks source link

Update table_headers_heuristics_data.py #650

Closed kristian-clausal closed 4 months ago

kristian-clausal commented 4 months ago

Using debug message data (should maybe move this data to a new error message category just for this...) and languages_with_cells_as_headers_debug_extract.py the heuristics/whitelist for wonky table headers has been updated.

This is why the message limit for debug messages had to be upped to 3 million; this required all debug messages.