openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Fixes #115 - Better temporary files handling #117

Closed satyamtg closed 4 years ago

satyamtg commented 4 years ago

Fixes #115 by introducing following changes -

The deletion of static folder is kept intact (delete if ZIM is being made, else keep). The newly added --tmp-dir argument takes the folder to store all temporary files and the static folder (if not specified by --static-folder). This allows us to retain the previous behaviour (i.e. running exploded parts of the process individually)

rgaudin commented 4 years ago

Honestly I don't think we need this.

There is no output folder in the current implementation. If not using the one-lang-one-zim option, you don't specify an output folder and that just goes to CWD.

We have options for specifying some folders and resources but that's not very helpful as cwd will anyway end up the large ZIM files.

I am fixing #115 by changing the Dockerfile to work off the /output folder which is configured on the ZIM farm. This way, /output is the cwd and everything gets stored there.

We'll change all this once we refactor. I suggest we close this as it doesn't completely fix it and adds a bit more complexity to this fragile edifice :)

satyamtg commented 4 years ago
  • Why are you encoding some logs ? Oh. I didn't see that. That was a mistake (didn't pull to the temporary folder where I was testing and copied from there) I'll take care that this doesn't happen again. Anyways, Since it's getting closed, it shouldn't matter.
rgaudin commented 4 years ago

OK I thought there was a reason for that as I had removed them previously to have better-looking logs. Good then.