spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
779 stars 129 forks source link

HTML comments and other template issues. #536

Closed andremacola closed 4 months ago

andremacola commented 1 year ago

I ended up creating an internal script that saves all possible terms/urls from Wikipedia with potential problems in Wikitext to a database. So far, I've found around 1500 articles with some peculiarities in the Portuguese Wikipedia.

Shouldn't WTF remove HTML comments within Wikitext? Check out the example of doc.text() on this page: https://pt.wikipedia.org/wiki/Elei%C3%A7%C3%B5es_legislativas_na_Tun%C3%ADsia_em_2019

There are some pages that caught my attention and have similar problems to a previous issue (https://github.com/spencermountain/wtf_wikipedia/issues/532), but I think they involve other templates issues:

I'm still getting familiar with templates in Wikitext, but if you point me in the right direction, I can start creating patches for each problem found.

spencermountain commented 1 year ago

this is great. thank you

spencermountain commented 4 months ago

hey, fixed the long-comment regex, and added support for Nihongo template in Portuguese. You'll have to update both wtf_wikipedia and use wtf-plugin-18n

I think the last one is ill-formed wikiscript.

image

Would love some help to improve Portuguese wp support. cheers