openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Fixed setup_urls on macOS #113

Closed rgaudin closed 4 years ago

rgaudin commented 4 years ago

setup_urls() does essentially three things:

  1. fetches a list of files in the mirror's server via rsync
  2. convert that rsync output to a list of relative paths
  3. import those paths to the Url table of the database

The first and thirs steps are very long. The second step used to be done with sed with explicit support for macOS but it was broken (at least on macOS), transforming the file into a list of \1 lines…

this replicates the sed step in-python with no risk to fail on any platform