tatuylonen / wikitextprocessor

Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
Other
94 stars 23 forks source link

Handle includeonly elements #315

Closed kristian-clausal closed 1 month ago

kristian-clausal commented 1 month ago

Fixes #314

Apparently (which renders text only when used as a template (transcluded), not when showing the template's own page so that these pages don't get stuff like wrong categories) has some weird whitespace trimming rules.

  1. If there's a newline before the end tag, return everything and add a space (because this causes a PRE block to appear??)
  2. Otherwise, if there's only links or other things that render as whitespace strip all the whitespace.
  3. If there's other text, only strip away the whitespace that is after that text, ignoring category links.
kristian-clausal commented 1 month ago

This is such a hack... If this goes through, .db files might need updating for it to take effect, because this affects dump parsing / page saving. I do not want to add parsing code into the parser to handle this stuff... D: One and done during .db generation seems appropriate.

kristian-clausal commented 1 month ago

As discussed in #314, this was not an issue with includeonly, but apparently how whitespace or newlines are trimmed around Category links.