spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
770 stars 129 forks source link

Infobox officeholder/office within Infobox royalty is losing data #538

Closed tmtmtmtm closed 1 year ago

tmtmtmtm commented 1 year ago

The infobox of Salman of Saudi Arabia, for example, includes a series of nested {{Infobox officeholder/office tied to succession keys, but the data within those appears to be getting lost.

https://runkit.com/tmtmtmtm/646b0cabd7325f00089c1f96 is a stripped down version of this, where, for example, the '5 November 2011' start date for being Minister of Defence doesn't seem to appear anywhere in the resulting JSON.

spencermountain commented 1 year ago

hey Tony, that value is in the second infobox, which you can get with doc.infoboxes()[1].json() cheers

tmtmtmtm commented 1 year ago

@spencermountain I'm not sure if I'm missing something here, but I can't see any way to tie that value back to the office it relates to (i.e. the Minister of Defense 'succession' entry), as they each end up in a different infobox entry.

The stripped down example possibly actually obscures this more than it helps, but if we look at the key → text pairings of the full https://en.wikipedia.org/wiki/Salman_of_Saudi_Arabia infobox, we get:

      "infoboxes": [
        {
          "termstart": "5 November 2011",
          "termend": "23 January 2015",
          "primeminister": "King Abdullah",
          "predecessor": "Sultan bin Abdulaziz",
          "successor": "Mohammed bin Salman"
        },
        {
          "termstart": "5 February 1963",
          "termend": "5 November 2011",
          "appointer": "King Saud",
          "predecessor": "Badr bin Saud",
          "successor": "Sattam bin Abdulaziz"
        },
        {
          "termstart": "18 April 1955",
          "termend": "22 September 1960",
          "appointer": "King Saud",
          "predecessor": "Nayef bin Abdulaziz",
          "successor": "Fawwaz bin Abdulaziz"
        },
        {
          "termstart": "16 March 1954",
          "termend": "18 April 1955",
          "appointer": "King Saud",
          "predecessor": "Nayef bin Abdulaziz",
          "successor": "Turki II bin Abdulaziz"
        },
        {
          "name": "Salman",
          "title": "Custodian of the Two Holy Mosques",
          "image": "File:Salman of Saudi Arabia - 2020 (49563590728) (cropped).jpg",
          "caption": "King Salman in 2020",
          "alt": "Photograph of King Salman in his 83rd year",
          "succession": "King of Saudi Arabia",
          "reign": "23 January 2015 – present",
          "cor-type": "Bay'ah",
          "coronation": "23 January 2015",
          "predecessor": "Abdullah bin Abdulaziz",
          "suc-type": "Crown princes",
          "successor": "Muqrin bin Abdulaziz (2015)\n\nMuhammad bin Nayef (2015–2017)\n\nMohammed bin Salman (2017–present)",
          "succession1": "Prime Minister of Saudi Arabia",
          "reign1": "23 January 2015 – 27 September 2022",
          "reign-type1": "Tenure",
          "predecessor1": "Abdullah bin Abdulaziz",
          "successor1": "Mohammed bin Salman",
          "succession2": "Crown Prince of Saudi Arabia\nDeputy Prime Minister",
          "reign2": "16 June 2012 – 23 January 2015",
          "reign-type2": "Tenure",
          "reg-type2": "King and Prime Minister",
          "regent2": "Abdullah bin Abdulaziz",
          "predecessor2": "Nayef bin Abdulaziz",
          "successor2": "Muqrin bin Abdulaziz",
          "succession3": "Minister of Defense",
          "succession4": "Governor of Riyadh Province",
          "succession5": "Deputy Governor of Riyadh Province",
          "birth_date": "December 31, 1935",
          "birth_place": "Riyadh, Saudi Arabia",
          "spouses": "Sultana bint Turki Al Sudairi (m. 1954-30 July 2011)\n\nSarah bint Faisal Al Subai'ai (divorced)\n\nFahda bint Falah Al Hithlain",
          "issue": "Prince Ahmed\n\nPrince Bandar\n\nPrince Fahd\n\nPrince Faisal\n\nPrince Saud\n\nPrincess Hassa\n\nPrince Khalid\n\nPrince Turki\n\nPrince Nayef\n\nPrince Sultan\n\nPrince Abdulaziz\n\nPrince Rakan",
          "issue-link": "#Personal life",
          "full name": "Salman bin Abdulaziz bin Abdul Rahman",
          "house": "Al Saud",
          "father": "Abdulaziz of Saudi Arabia",
          "mother": "Hassa bint Ahmed Al Sudairi",
          "signature": "توقيع الملك سلمان.svg",
          "signature_alt": "Signature of King Salman"
        }
      ],

The four nested Infobox officeholder/office templates get pulled up to their own elements, and I can't see how to then tie them back to which succession they were originally with: i.e. infoboxes[0] → succession3, infoboxes[1] and infoboxes[2] → succession4, and infoboxes[3] → succession5.

(NB: I'm not entirely sure how it's really working on-wiki either, as it doesn't seem to follow any of the normal WP:IEmbed approaches, so I can't tell if this is a supported way of doing this, or if just currently happens to work.)

spencermountain commented 1 year ago

yeah - I think you'll have to loop through the infoboxes w/ your custom logic. doc.infoboxes.map(i => i.json()) That's all the .json() method does. Youre right - i don't know how the infobox nesting idea makes sense for anyone - especially the editors. it's pretty gross - let me know if i've misunderstood something cheers