spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
770 stars 129 forks source link

Missing 'office' for Rajesh Sonkar #530

Closed tmtmtmtm closed 1 year ago

tmtmtmtm commented 1 year ago

With version 10.1.4, when I run wtf_wikipedia Rajesh Sonkar it's failing to pick up the "office" line from the infobox:

| name                = Dr. Rajesh Sonkar
| image               = 
| birth_date          = {{Birth date and age|df=y|1968|12|9}}
| birth_place         = [[Indore]], [[Madhya Pradesh]], [[India]]
| office              = [[President Bhartiya Janta Party(BJP) Indore, Madhya Pradesh]]
| term_start          = 10 May 2020
| term_end            = 
| constituency        = 
| succeeded           = 
| party               = [[Bharatiya Janata Party]]

        {
          "name": {
            "text": "Dr. Rajesh Sonkar"
          },
          "birth_date": {
            "text": "December 9, 1968"
          },
          "birth_place": {
            "text": "Indore, Madhya Pradesh, India",
            "links": [
              {
                "type": "internal",
                "page": "Indore"
              },
              {
                "type": "internal",
                "page": "Madhya Pradesh"
              },
              {
                "type": "internal",
                "page": "India"
              }
            ]
          },
          "term_start": {
            "text": "10 May 2020"
          },
          "party": {
            "text": "Bharatiya Janata Party",
            "links": [
              {
                "type": "internal",
                "page": "Bharatiya Janata Party"
              }
            ]
          },
spencermountain commented 1 year ago

hey Tony, this works for me can you reproduce it? cheers

tmtmtmtm commented 1 year ago

Oh, this is a bit more interesting than I noticed at first. Looks like it's because there's an additional "Office" line later in the infobox: https://runkit.com/tmtmtmtm/645528e391100800088f1cb5

On-wiki 'office' and 'Office' appear to be treated as different, with the second ignored as an unknown parameter, whereas wtf_wikipedia presumably handles them case-insensitively, so the second clobbers the first.

spencermountain commented 1 year ago

oh my gosh, good catch. will add a fix to the next release thanks

spencermountain commented 1 year ago

fixed in 10.1.5