Open josephlewis42 opened 2 months ago
Indeed providing a base scraper out of what is currently a strong convention would boost up new scraper development time and reduce maintenance overall.
Needs to be considered. @benoit74 I can't find your ticket suggesting a single-repo mega scraper. Where is it?
Thank you @josephlewis42 for the proposition, definitely needs to be considered, your points are really valid.
@benoit74 I can't find your ticket suggesting a single-repo mega scraper. Where is it?
Single-repo mega scraper was more a question in the back of my mind and a topic for the hackathon, I never opened an issue on this.
I started much smaller with only merging zimit and warc2zim repos: https://github.com/openzim/zimit/issues/303
That being said, I'm not sure we need to go mono-repo for the suggestion to be implemented. We can provide MultiZimScraper
in python-scraperlib and still live with separate repos per scraper.
On this idea of having shared code, I would even like to consider sharing things like:
add_contents
to process "things" in parallel when neededSo definitely something useful to consider.
Thank you @josephlewis42 for taking time to write this issue and bringing this subject on the table.
When recently building the DevDocs scraper, I realized there are a ton of things that I was relying on @benoit74's expertise for to make the scraper sustainable for ZimFarm but weren't core to building a functional scraper. These include:
The scraper still isn't fully there, but I've already spent a lot of time implementing and testing some of these things. I've written Logstash plugins and Elastic Beats (which are more smiilar to Zim scrapers and those were dramatically easier because that common logic was abstracted away.
I'd love to see something like the following (example only!) as what I had to build while knowing most of the above would be taken care of:
I don't think all scrapers would need to use this format, but something like it would have dramatically cut down on the amount of testing and knowledge needed for me to produce a quality Zim scraper.