openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Giant Downloads When Only Require Tiny Zim #82

Closed nobicycle closed 5 years ago

nobicycle commented 5 years ago

Hello,

If I run the gutenberg2zim with -b to download a handful of books there is a stage where a large download occurs:

bash -c rsync -a --list-only rsync://aleph.gutenberg.org/gutenberg/ > tmp/file_on_aleph_gutenberg_org

On the first pass it downloaded a 250MB file. On the second pass it is at 30MB and counting ...

Is it possible to avoid this rsync stage?

Best wishes

kelson42 commented 5 years ago

@rgaudin Do you know more?

rgaudin commented 5 years ago

I think it's mandatory. Haven't looked at code but I believe this sync is the base of the indexing of books (other options proved unreliable).

nobicycle commented 5 years ago

thanks