mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.28k stars 919 forks source link

Question about API calls with Exhentai #1022

Closed ghost closed 1 year ago

ghost commented 3 years ago

I thought it was just my imagination, but I used two different VMs and two different proxies with gallery-dl and another script and I can confirm that I'm getting banned much faster with gallery-dl. The other script was downloading at a much faster pace too.

Is gallery-dl pulling data for each downloaded image from the api? If so, what's the purpose of it? Exhentai hands out temp bans like candy if you make too many requests like that. https://github.com/mikf/gallery-dl/blob/2184ec5d78178b867b21fcb29c7987f590e9504c/gallery_dl/extractor/exhentai.py#L259

mikf commented 3 years ago

It is replicating what happens when you browse a gallery page-by-page with your browser. The first image page gets loaded as HTML page, and anything further gets done with an API call (try it yourself with your browser's network monitor) There is probably a better way, like whatever "the other script" does, but this method here worked for my purposes.

What is the other script you mentioned? https://github.com/ccloli/E-Hentai-Downloader?

ghost commented 3 years ago

Yeah that's the one. Wouldn't it lead to fewer temp bans if it loaded the gallery page instead of doing API calls for each image? Div class GDT is the page, and each GDT1 contains a link to the image. https://i.imgur.com/X7m45gX.png

mikf commented 3 years ago

I've taken a quick look at https://github.com/ccloli/E-Hentai-Downloader and it doesn't use the API at all. It goes to each HTML image page like https://exhentai.org/s/f68367b4c8/1200119-3 in parallel instead, even though that costs more bandwidth and is more "unnatural" compared to browsing the site. Guess I'll rewrite the exhentai code to do that, instead of using the API.

0lm commented 3 years ago

since this is still open, i didnt want to make a new issue. not sure if this even belongs here but.. so far exhentai seems to work but problem is, fetching image data is extremely slow. its been nearly 1h and its still at image page 400 (gallery has a bit over 1000 images). so yeah, its still downloading each image. is that supposed to take that long? are there any commands to fasten that? like.. enabling multi threading or something similar? thats what i also miss btw. some gallery downloads (doesnt matter what site, all the same) take extremely long because the galleries ae big but gallery-dl downloads them only 1 by 1 with (i assume) single thread. is there a way to set up simultanous downloads, like it shall download 5 images or 10 images at once? or a way to enable multithread? i tried the -g command to only get direct links of images, so i can download with external download manager. problem is, this takes as long as downloading via gallery-dl. is there a way to fasten URL fetching when using -g command?

mikf commented 3 years ago

@0lm downloading from exhentai is really slow because the current way of fetching download URLs causes you to get temp banned rather quickly and there are wait times in place to try to prevent that. Use -o wait-min=0 -o wait-max=0 to disable them.

Parallel downloads also aren't a thing at the moment, but will be implemented eventually (#31).

0lm commented 3 years ago

thanks for the info!

ghost commented 3 years ago

some gallery downloads (doesnt matter what site, all the same) take extremely long because the galleries ae big but gallery-dl downloads them only 1 by 1 with (i assume) single thread. is there a way to set up simultanous downloads, like it shall download 5 images or 10 images at once? or a way to enable multithread?

That's a good thing, trust me. I also believe it's too slow, but if you go any faster the site will temporarily ban you for making too many requests too soon.

I don't know if this is easy to implement since I'm not a programmer, but if it'd be possible for gallery-dl to wait X seconds before downloading an image at Y size, it might reduce the risk of getting banned. Example, wait 15 seconds if the image is greater than than 9 MB, wait 5 seconds if it's greater than 3 MB and so on. This would really come in handy for artist and game CGs.

wankio commented 3 years ago

exhentai only for HCG, and HCG always have 500-2000images (some fully of animation gif), you should using torrent instead. for artist stuff u should using nhentai.

i'm usually using Hentai@Home official, make ur PC P2P (if you want power save, you can using old laptop with large hdd or Pi, so you let it running 24/7) it have script to automatic compress completed galleries too. it will downloaded galleries in your H@H list, earning point,...