vericast / conda-mirror

Mirror upstream conda channels
BSD 3-Clause "New" or "Revised" License
72 stars 60 forks source link

keep old packages #56

Closed MWigger closed 7 years ago

MWigger commented 7 years ago

Hello, I have a little struggle with conda-mirror: We use environments where we sometimes define package versions. Unfortunately conda-mirror deleted one of them, since it was "to old" Is there a way to keep old packages, or even better: have a blacklist of packages not to remove?

ericdill commented 7 years ago

Hi @MWigger. Thanks for the issue report. What version of conda-mirror are you running?

Unfortunately conda-mirror deleted one of them, since it was "to old"

I suspect what happened was that conda-mirror deleted a package that you had locally because it was removed from the upstream repository. There is currently no mechanism that checks the age of package. There are four conditions where a package will be removed.

  1. Package was deleted from the upstream repository
  2. Package is an invalid tarball
  3. Package fails the md5 checksum
  4. Package fails the sha256 checksum

If you were mirroring the defaults channel or the conda-forge channel, I can check my own logs at that time to see if I can help debug this issue.

Can you let me know which package was removed and the date that you noticed it? If you're not comfortable posting that information to github, please feel free to send me an email at eric.dill@maxpoint.com and I can help you debug this in a 1:1 conversation.

MWigger commented 7 years ago

Hello, The packages where we noticed it are cffi-1.7.0-py27_0.tar.bz2 and mkl-2017.0.1-0 we recognized it today

Before this we had and error like the attached one We updated then to 0.7.2 and started the update, which leaded us to the deleted packages

ericdill commented 7 years ago

Are you logging the output of conda-mirror? Whenever it removes packages it notes why. What is the upstream channel that you are mirroring?

I see these errors in my logs for the anaconda channel:

WARNING: Removing: /opt/maxpoint/data/conda/www/upstream-anaconda/osx-64/mkl-2017.0.1-0.tar.bz2. Reason: Failed size test
...
WARNING: Removing: /opt/maxpoint/data/conda/www/upstream-anaconda/win-64/mkl-2017.0.1-0.tar.bz2. Reason: Failed size test
...

FWIW conda-mirror will attempt to redownload packages every time you run it. You should be able to run conda-mirror again and it will attempt to download again any packages that failed validation the last time you ran it.

ericdill commented 7 years ago

What I neglected to add to the last comment is that I currently have mkl-2017.0.1-0 locally. It did fail once, but has not failed in the last 3 or so weeks.

MWigger commented 7 years ago

I download the channels anaconda and conda-forge

Last time I used –vv, so I did not save the log I get the deleted packages with –v , right?

ericdill commented 7 years ago

Yes. Any package removal gets dumped as warning level which is -v

MWigger commented 7 years ago

The process is still running, but I already see this message: WARNING: Removing: /media/mirror-data/conda/conda-forge/win-64/cffi-1.7.0-py27_0.tar.bz2. Reason: Tarfile read failure

MWigger commented 7 years ago

Hello, My messages look now like this:

(invoke command was: conda-mirror --upstream-channel conda-forge --target-directory /media/mirror-data/conda/conda-forge/ --platform win-64 --temp-directory /media/mirror-data/tmp/conda-forge-win -v --num-threads=1)

WARNING: Removing: /media/mirror-data/conda/conda-forge/win-64/spyder-app-2.3.8-py27_0.tar.bz2. Reason: Package is not in the repodata index WARNING: spyder-app-2.3.8-py34_0.tar.bz2 is not in the upstream index. Removing... WARNING: Removing: /media/mirror-data/conda/conda-forge/win-64/spyder-app-2.3.8-py34_0.tar.bz2. Reason: Package is not in the repodata index WARNING: spyder-app-2.3.8-py35_0.tar.bz2 is not in the upstream index. Removing... WARNING: Removing: /media/mirror-data/conda/conda-forge/win-64/spyder-app-2.3.8-py35_0.tar.bz2. Reason: Package is not in the repodata index multiprocessing.pool.RemoteTraceback:

Traceback (most recent call last): File "/opt/miniconda3/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/opt/miniconda3/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 486, in _validate_or_remove_package size=package_metadata.get('size')) File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 303, in _validate t.extractfile('info/index.json').read().decode('utf-8') File "/opt/miniconda3/lib/python3.5/tarfile.py", line 2062, in extractfile tarinfo = self.getmember(member) File "/opt/miniconda3/lib/python3.5/tarfile.py", line 1736, in getmember tarinfo = self._getmember(name) File "/opt/miniconda3/lib/python3.5/tarfile.py", line 2317, in _getmember members = self.getmembers() File "/opt/miniconda3/lib/python3.5/tarfile.py", line 1747, in getmembers self._load() # all members, we first have to File "/opt/miniconda3/lib/python3.5/tarfile.py", line 2340, in _load tarinfo = self.next() File "/opt/miniconda3/lib/python3.5/tarfile.py", line 2271, in next self.fileobj.seek(self.offset - 1) File "/opt/miniconda3/lib/python3.5/bz2.py", line 277, in seek return self._buffer.seek(offset, whence) File "/opt/miniconda3/lib/python3.5/_compression.py", line 143, in seek data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset)) File "/opt/miniconda3/lib/python3.5/_compression.py", line 99, in read raise EOFError("Compressed file ended before the " EOFError: Compressed file ended before the end-of-stream marker was reached

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/opt/miniconda3/bin/conda-mirror", line 11, in sys.exit(cli()) File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 252, in cli main(**_parse_and_format_args()) File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 615, in main validation_results = _validate_packages(desired_repodata, local_directory, num_threads) File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 436, in _validate_packages val_func_arg_list) File "/opt/miniconda3/lib/python3.5/multiprocessing/pool.py", line 260, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/opt/miniconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get raise self._value EOFError: Compressed file ended before the end-of-stream marker was reached

ericdill commented 7 years ago

Thanks for the updated information @MWigger .

There are two distinct pieces happening there.

The first is the expected behavior of conda-mirror removing packages locally that have been removed upstream. The intent of conda-mirror is to be a complete mimic of what is available upstream. It is unfortunate that upstream removed a package that you were depending on locally. That being said, when packages are removed from upstream there is usually a good reason for this. I have zero control over or insight into the reasons for package removal from upstream.

The EOFError is something that I need to be catching. That's a new bug. Thanks for finding that 😀

MWigger commented 7 years ago

Well, I just saw thast for example for the cffi package there is a 1.7.0 version on the coda-forge channel: https://anaconda.org/conda-forge/cffi/files

So it might be a different issue

ericdill commented 7 years ago

I have seen similar package removals for 'failed size test'. I dug in to a couple of these occurances and what I found was that the package, let's take cffi-1.7.0-py27_0.tar.bz2 for example, had been deleted from the repo that I was mirroring and re-uploaded with one or more pieces of metadata changed. I think one time I looked, a license file had been added to the package. Since this does not affect the run time behavior of the package, bumping the build number (to cffi-1.7.0-py27_1.tar.bz2) was likely deemed unnecessary. Indeed only people that are maintaining their own local copy of the entire upstream repository would notice.

The reason why re-uploading a package with different metadata will cause conda-mirror to fail the size test is that the new uploaded package on upstream has a different size. The repodata.json file for that upstream repo has already been updated for the packages new size so when I compare the expected size in the upstream repodata.json with the size of the package that has been mirrored locally, they do not match. Because I treat the upstream repodata.json as the source of truth, that means the local package is now "incorrect" and it get's removed. Then the package that was just removed because of its "incorrect" size will be redownloaded from upstream.

ericdill commented 7 years ago

Closed by #58