plazi / wikidata

0 stars 0 forks source link

Bot development for new taxa #4

Open myrmoteras opened 2 years ago

myrmoteras commented 2 years ago

Hi Donat,

It took me a while to get a grip on the data. I ended up writing an HTML scraper to get the JSON out of Zenodo, it is not ideal, but the Wikidata bot now works. I cherry-picked a few treatments from the main plazi website and created the following wikidata items: https://w.wiki/5A3h

There are more treatments added, which did not show up in the query above. This was because of some issues with publication links in their associated publications. This query will show all treatments currently in Wikidata: https://w.wiki/5A3i .

I can now add more treatments to wikidata given a set of plazi UUIDs. The bot uses both the RDF and the JSON from zenodo. I would able to rely on the RDF only if the following changes are made to the RDF:

  1. Add the DOI of the scientific publication associated with the treatment. In some cases it often contains zenodo intermediate DOIs, which need to be resolved through the json.
  2. Add the locations to the RDF. The second item the bot takes from the JSON is the location coordinates.
  3. Use URIs and rdfs:label in the RDF. The taxonomic tree, currently uses literals for the different clades. For each clade the complete parent branch is repeated. Can this be simplified by changing the clades from strings to URIs? as in this example:"

    <http://taxon-concept.plazi.org/id/Animalia/Brighstoneus_simmondsi_Lockwood_2021> a dwcFP:TaxonConcept ;

    trt:hasTaxonName <[http://taxon-name.plazi.org/id/Animalia/Brighstoneus_simmondsi](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tHtOlCv7Bo4ieJbfA81xXxn2LPMKB7IGqemDoAh1B4k%3D&reserved=0)> ;
    dwc:genus "Brighstoneus" ;
    dwc:kingdom "Animalia" ;
    dwc:order "Ornithischia" ;
    dwc:rank "species" ;
    dwc:scientificNameAuthorship "Lockwood & Martill & Maidment, 2021" ;
    dwc:species "simmondsi" .

Would become:

<[http://taxon-concept.plazi.org/id/Animalia/Brighstoneus_simmondsi_Lockwood_2021](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-concept.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi_Lockwood_2021&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TMe9RO%2BISuIabuowVG1vndNwo6l2q0Iwr7dm4Hqg8po%3D&reserved=0)> a dwcFP:TaxonConcept ;
    trt:hasTaxonName <[http://taxon-name.plazi.org/id/Animalia/Brighstoneus_simmondsi](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tHtOlCv7Bo4ieJbfA81xXxn2LPMKB7IGqemDoAh1B4k%3D&reserved=0)> .
<[http://taxon-name.plazi.org/id/Animalia/Brighstoneus_simmondsi](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tHtOlCv7Bo4ieJbfA81xXxn2LPMKB7IGqemDoAh1B4k%3D&reserved=0)> rdfs:label "Brighstoneus simmondsi" ;
    dwc:rank wd:Q7432 ; # Q7432 = species
    trt:hasParentName <[http://taxon-name.plazi.org/id/Animalia/Brighstoneus](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mP07%2Fy99C71Yf0CofUhnUcHu6OejqbrKCs%2BiuzE3%2BLw%3D&reserved=0)> .

The next step is to request a bot account and/or permission to do this on scale. But I propose to first discuss the current schema on Wikidata and make some possible adaptations.

Cheers,

Andra