openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
37 stars 2 forks source link

zim of ganjoor.net/ #1

Open kelson42 opened 5 years ago

kelson42 commented 5 years ago

From @amirh123123 on September 14, 2018 10:52

ganjoor.net/

Copied from original issue: openzim/mwoffliner#376

RavanJAltaie commented 1 year ago

Requested https://farm.openzim.org/recipes/ganjoor.net

RavanJAltaie commented 1 year ago

The recipe didn't succeed

benoit74 commented 3 months ago

Reopening, ZIM is still not published.

Last run (https://farm.openzim.org/pipeline/fda86c72-527b-480c-963d-5160336068c5) was quite successfully processing the website but I had to stop it, it was at 29% (338277 / 1150670) - yes, more than 1Million pages - after already 20 days in the pipe.

This does not seem reasonable, one has to identify why so many pages are present / wether this is really needed to scrape all these pages.

Popolechien commented 2 months ago

This million pages thing reminds me of #650 (though this other one as microsoft is slightly more believable). Could it be a scraper issue where it sees pages that do not really exist?

benoit74 commented 2 months ago

All these pages do really exist. They might be a bit "virtual" in the sense that they are URLs to pages which are dynamically rendered on the real server, or even a bit fake due to a bug in upstream server, but the crawler (except very rare bug) cannot invent new URLs.