Webscrape of the Zimmerman en Space podcast, and (re)publication on Wikimedia Commons (and Zenodo in the future). High 5 for CC0 licenses, space, astronomy and nerds!
Latest update : 17 September 2024
Episodes 1 - 92 are now available on Wikimedia Commons:
Output of webscrape, with post-processing to make data suitable input for Wikimedia Commons, OpenRefine and the Python modules used below: https://ookgezellig.github.io/Zimmerman-en-Space-podcast/ZimmermanEnSpacePodcast_episodes1-92.xlsx
Converting from mp3 to ogg/oga:
Wikimedia Commons:
Full-text audio transcriptions are being added bit by bit to the Commons files in the coming months.
To the structured data of each Commons file, main subject (P921) will be added bit by bit in the coming months. These episode subjects/keywords will be extracted from the title and full-text audio transcriptions using Named Entity Recognition (NER) techniques and subsequent reconciliation of the found entities against Wikidata. For current status, see this issue.
For a fully worked example, see S01E01 Tsunami's op Mars.
Request info about episode 14, AI en Chat GPT in de sterrenkunde
Structured data has been added to all files, so we can do some (basic) semantic searching via SPARQL queries.
All episodes 1-92 of the Zimmerman en Space podcast have been licensed under the Creative Commons CC0 1.0 license, as stated in the shownotes of each episode.