openzim / zimit

Make a ZIM file from any Web site and surf offline!
GNU General Public License v3.0
277 stars 23 forks source link

creating a zim from a website that host images on https://imageshack.com/ #334

Open kroryan opened 2 weeks ago

kroryan commented 2 weeks ago

how can i create a zim that includes the pictures on imageshack?

i tried this but it doesnt take the picture from images hack:

sudo docker run -v /media/usb/output:/output --shm-size=1gb ghcr.io/openzim/zimit zimit --url url --name url1 --workers 10 --waitUntil domcontentloaded

is there a way to do it?

benoit74 commented 2 weeks ago

I'm pretty sure it is not easily feasible, imageshack has tons of reason to want to avoid such actions. At least if I'm not mistaken you need to login into imageshack, so you need to pass this login information to Browsertrix crawler which is ran by kiwix. I'll give few hint but it would deserve a very very long tutorial. In Browsertrix crawler, this is done with a browser profiles, see https://crawler.docs.browsertrix.com/user-guide/browser-profiles/ ; once you have a browser profile, you can pass this tar.gz in zimit CLI with the --profile argument I recommend to start with only 1 worker, you can always increase it later once you have a working setup, but more workers also means more likely detection by anti-bot systems.

benoit74 commented 2 weeks ago

Oh, sorry, I probably misread your situation. You are not crawling imageshack but a website which uses imageshack as image provider? How is that possible, I thought imageshack is quite restrictive on this ...