Proposal: move analyze template code to wiktextract extractor code

Currently only en edition uses Wtp.analyze_templates() to find which template needs pre-expand, some non-en editions like zh and de edition override some heading templates need pre-expand and also need to change the page text to heading wikitext. But for nl edition, all sections are expanded from templates, override all of them would create a long override JSON file difficult to maintain also they have some if functions to create category links.

I'd like to suggest move Wtp._analyze_template() to wiktextract package's en edition folder, pass this function to dumpparser.process_dump() then pass it to Wtp.analyze_template(). I also want only return a bool type from _analyze_template() because I think we could have the same result by changing this line to expand all templates used in a pre-expanded template:

https://github.com/tatuylonen/wikitextprocessor/blob/59b8406ffb5149720701f2f8b2aae732f731ea39/src/wikitextprocessor/core.py#L1664-L1666

t = expand_recurse(
    encoded_body, new_parent, expand_all or template_page.need_pre_expand
)

I think the conditions to check whether a template needs pre-expand vary between editions and can't be shared without unintended result. For example, in nl edition, we only need to check the template name starts and ends with "=" or "-", but en code checks if it has lists or unclosed HTML tags.

tatuylonen / wikitextprocessor

Proposal: move analyze template code to wiktextract extractor code #316