tatuylonen / wiktextract

Wiktionary dump file parser and multilingual data extractor
Other
799 stars 82 forks source link

Templates missing in Chinese etymology and pronunciation #620

Closed GrimPixel closed 3 months ago

GrimPixel commented 5 months ago

For example,

xxyzz commented 4 months ago

Extract fr edition's cmn-pron template is added in #621.

The en edition issue will wait for @kristian-clausal to have a look next week. I guess it's related with the etymology, pronunciation and pos title structure, and the second etymology title("Etymology") overwrites the first one("Glyph origin").

kristian-clausal commented 4 months ago

I thought the simplest, dumbest way to fix this was to combine the two sections together. We have a function in en/page.py where Tatu does some old-school text editing to fix the 'depth' of article sections; that is, if there are too few "==="'s just add them. More sophisticated than that, but not by much. I added a bit of code to see if two Etymology sections (Glyph Origin is seen as an alias for Etymology sections) next to each other (without any sections inside the previous Etymology section) and then... Oh, I just thought of a bug. And then it leaves out the title of the other etymology section, effectively combining them. This way the Glyph Origin gets a new home. The bug I thought of was that we do some stuff with the Etymology sections related to numbered Etymology sections (Etymology 1, Etymology 2, etc.), so that needs tweaking... Probably just check if either of the titles has a number and using that as the title instead. Committed to the current PR request with the other Chinese stuff (zh-x example templates).

xxyzz commented 3 months ago

Both are fixed on kaikki.org.