r-three / common-pile

Repo to hold code and track issues for the collection of permissively licensed data
MIT License
22 stars 6 forks source link

publicdomainreview.org #60

Closed nkandpa2 closed 8 months ago

nkandpa2 commented 10 months ago

This is a blog containing relatively long-form essays about works that have entered the public domain. The essays themselves are under a CC-BY SA license (see here for license info).

nkandpa2 commented 10 months ago

This is not a huge source of data (a couple thousand essays) but it requires very little work to scrape/clean and the text is high-quality (seems to be authored by people who study and write about art/history/culture professionally). I have the code for scraping/cleaning the text from their site here. Only thing left to do is to convert the text to Dolma.

StellaAthena commented 9 months ago

Looks good! Let's include it.

craffel commented 8 months ago

Closed via #67