Closed blester125 closed 8 months ago
The full scrape is here https://huggingface.co/datasets/blester125/project-gutenberg-dolma
This PR includes some updates I made while processing all the Project Gutenberg data.
Changes include:
--books
The full scrape is here https://huggingface.co/datasets/blester125/project-gutenberg-dolma
This PR includes some updates I made while processing all the Project Gutenberg data.
Changes include:
--books
which skips/adds specific special case books