vericast / conda-mirror

Mirror upstream conda channels
BSD 3-Clause "New" or "Revised" License
72 stars 60 forks source link

Validate that there is enough space to actually perform the mirror #19

Closed ericdill closed 7 years ago

ericdill commented 7 years ago

There has been a user report of the following stack trace:

INFO: download_url=https://anaconda.org/conda-forge/gdal/2.1.1/download/win-64/gdal-2.1.1-np111py34_4.tar.bz2
Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 253, in _download
    tf.write(data)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/miniconda3/bin/conda-mirror", line 11, in <module>
    sys.exit(cli())
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 183, in cli
    blacklist, whitelist)
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 427, in main
    _download(url, download_dir, repodata)
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 253, in _download
    tf.write(data)
OSError: [Errno 28] No space left on device

I'm pretty sure that this is because there is not enough space in the /tmp directory on the host machine where this command was being run.

One way to fix this would be to compute the space required to perform the mirror for all packages that are in the to_mirror set by using the bytes stored in the size key in repodata[pkg_name]['size']. Would need to check that there is enough space in the temp directory located at download_dir and the final location for these packages at 'local_directory'.

MWigger commented 7 years ago

Hello, To verify that it is the fault of low space on /tmp I replaced the line 418 with with tempfile.TemporaryDirectory( dir="/media/myfilesysten......" ) as download_dir:

With this change it worked, so to face my issue, it would be sufficient to make the temp folder configurable.

But what is the point of the tmp folder, why is the download not directly written to the target folder?

ericdill commented 7 years ago

With this change it worked, so to face my issue, it would be sufficient to make the temp folder configurable.

Great, I'll do that in the next week or so

But what is the point of the tmp folder, why is the download not directly written to the target folder?

I run conda-mirror as an hourly cron job. It felt easier to download everything to a staging directory and then validate all the packages in that staging directory, removing packages that do not pass validation (size/md5/sha256). Only once packages have been downloaded are they promoted from the staging directory to the directory where they are being served from. I honestly hadn't considered that /tmp might not have enough space to do a full channel mirror. This issue will only be hit the first time you mirror the channel since that's going to require >10GB of space. Additionally, if I directly download to the directory where the packages are being served from, I occasionally hit the issue where conda install <package> would be grabbing a partially downloaded package and fail. Though that issue could also be avoided by downloading the file to a temp file in that directory and then moving it to its actual package name .

In any event, the problem should be solved by adding a command line argument that will allow you to specify where the transient download directory should be.

MWigger commented 7 years ago

Thanks for the explanation, that makes sense

ericdill commented 7 years ago

@MWigger I'd also welcome a PR from you implementing this feature :smile: I'll get to it eventually, but it's not a priority for me at the moment

jneines commented 7 years ago

Being a co-worker of @MWigger I have applied some changes to fix this issue. Find the implementation in my current pull request. It's based on adding a new parameter for specifying the temporary directory to be used, with having a suitable default set and using this setting as the dir parameter of tempfile.TemporaryDirectory to use the setting. Tests pass successfully and the implementation solves our current problem.

ericdill commented 7 years ago

Closed by #21. Thanks for the report @MWigger and thanks for the implementation @jneines ! New release is available on pypi that has these changes (0.6.0)