spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
770 stars 129 forks source link

Some common templates are not expanded correctly #572

Open s-jse opened 6 months ago

s-jse commented 6 months ago

Hi,

I have noticed that some templates are not expanded properly.

One case is the following from https://en.wikipedia.org/wiki?curid=594 where {{transliteration|grc|ephebeia}} which should have been expanded (to ephebeia) wtf_wikipedia output: Long hair, which was the prerogative of boys, was cut at the coming of age and dedicated to Apollo.

Article on the Wikipedia website:

image

Output from https://en.wikipedia.org/w/api.php?action=expandtemplates&text={{transliteration|grc|ephebeia}}&prop=wikitext:

image

Another case is a lot of missing birth dates, for example {{nihongo|'''Haruki Murakami'''|村上 春樹|Murakami Haruki|extra=born January 12, 1949<ref>{{cite news|url= https://www.upi.com/Top_News/2021/01/12/UPI-Almanac-for-Tuesday-Jan-12-2021/5231610417906/|title= UPI Almanac for Tuesday, Jan. 12, 2021|work= [[United Press International]] | date= January 12, 2021|accessdate=February 27, 2021 | archive-date= January 29, 2021|archive-url= https://web.archive.org/web/20210129023331/https://www.upi.com/Top_News/2021/01/12/UPI-Almanac-for-Tuesday-Jan-12-2021/5231610417906/|url-status=live|quote = … author Haruki Murakami in 1949 (age 72)}}</ref>}} is ignored in the output:

wtf_wikipedia output: Haruki Murakami (村上 春樹) is a Japanese writer.

Article on the Wikipedia website:

image

This is interesting because I can see the correct parse of the template in wtf_wikipedia's output, just under the "template" field:

      {
          "english": "Haruki Murakami",
          "kanji": "村上 春樹",
          "romaji": "Murakami Haruki",
          "extra": "born January 12, 1949",
          "template": "nihongo"
        },
s-jse commented 6 months ago

For the {{transliteration}} issue, I added this, not sure if it is enough or always correct:

wtf.extend((models, templates) => {
    // (template name lowercased)     
    templates.transliteration = function (tmpl, list) {
        let arr = tmpl.split('|')
        let text = arr[arr.length - 1]
        if (text.endsWith("}}")) {
            text = text.slice(0, -2)
        }
        // add data to .templates() response
        list.push({ template: 'transliteration', text: text })
        return text
    }
})
spencermountain commented 6 months ago

hey Sina, good catch with {{transliteration}}. Happy to support this one.

Same for the extra param in the nihongo template. {{nihongo|'''Haruki Murakami'''|村上 春樹|Murakami Haruki|extra=born January 12, 1949}}

both should be pretty doable. Thanks for the help.