vericast / conda-mirror

Mirror upstream conda channels
BSD 3-Clause "New" or "Revised" License
71 stars 59 forks source link

Make download more robust #64

Closed gpcimino closed 5 years ago

gpcimino commented 6 years ago

Enclose download with try/catch block to avoid crashes due to network errors and use multiple download attempts cycle with backoff time and jitter. Bonus: print usage message to stdout in case cmd line param are not correct, instead of exception!

pep8speaks commented 6 years ago

Hello @gpcimino! Thanks for updating the PR.

Line 278:32: E203 whitespace before ':' Line 279:17: E203 whitespace before ':' Line 494:84: E502 the backslash is redundant between brackets Line 552:62: E502 the backslash is redundant between brackets Line 553:3: E128 continuation line under-indented for visual indent Line 553:44: E502 the backslash is redundant between brackets Line 554:3: E128 continuation line under-indented for visual indent Line 554:50: E502 the backslash is redundant between brackets Line 555:3: E128 continuation line under-indented for visual indent

Line 160:33: W291 trailing whitespace Line 180:33: W291 trailing whitespace

Comment last updated on February 06, 2018 at 13:54 Hours UTC
ericdill commented 6 years ago

Hi @gpcimino -- Thanks for the PR. I think we're definitely on the right track here. I've got a couple of comments

gpcimino commented 6 years ago

HI @ericdill,

OK, i can implement i simple back off function in the conda-mirror code base

OK, make sense!

I agree catching Exception is bad. However my idea was to make the _download(...) as much robust as possible: when i download a x GB repo i don't want to crash for any reason. Said that, if we split the requests call from the save on disk part we can surely use 2 more detailed exceptions. I will work on that.

BTW, looks like download is not multi-threading. Would be nice to user multi threading for download as well, this will really speed up the process a lot. What do you think?

GP

gpcimino commented 6 years ago

Here is the log of conda-mirror running right now using the exception handling and the backoff. The exception is ConnectionResetError, BTW i am running on Windows10 workstation.

DEBUG: Downloading to d:\tmp\conda-forge\tmpaxh5kb0m\beakerx-0.10.0-py35_0.tar.bz2
ERROR: Failure in network connection
Traceback (most recent call last):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 302, in _error_catcher
    yield
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 384, in read
    data = self._fp.read(amt)
  File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 449, in read
    n = self.readinto(b)
  File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 1002, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 865, in read
    return self._sslobj.read(len, buffer)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 625, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\requests\models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 436, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 401, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "C:\Program Files (x86)\Python36-32\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 320, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)", ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\conda_mirror\conda_mirror.py", line 412, in _download
    for data in ret.iter_content(chunk_size):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\requests\models.py", line 748, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)", ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
INFO: Wait for 2.0 seconds
INFO: Try download again
INFO: Start downlaod of file https://conda.anaconda.org//conda-forge/linux-64/beakerx-0.10.0-py35_0.tar.bz2
DEBUG: Downloading to d:\tmp\conda-forge\tmpaxh5kb0m\beakerx-0.10.0-py35_0.tar.bz2
ERROR: Failure in network connection
Traceback (most recent call last):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 302, in _error_catcher
    yield
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 384, in read
    data = self._fp.read(amt)
  File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 449, in read
    n = self.readinto(b)
  File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 1002, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 865, in read
    return self._sslobj.read(len, buffer)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 625, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\requests\models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 436, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 401, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "C:\Program Files (x86)\Python36-32\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "d:\pyvenv\condamirror-prod\lib\site-packages\urllib3\response.py", line 320, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)", ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\conda_mirror\conda_mirror.py", line 412, in _download
    for data in ret.iter_content(chunk_size):
  File "d:\pyvenv\condamirror-prod\lib\site-packages\requests\models.py", line 748, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)", ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
INFO: Wait for 2.967693344674893 seconds
INFO: Try download again
INFO: Start downlaod of file https://conda.anaconda.org//conda-forge/linux-64/beakerx-0.10.0-py35_0.tar.bz2
DEBUG: Downloading to d:\tmp\conda-forge\tmpaxh5kb0m\beakerx-0.10.0-py35_0.tar.bz2
ericdill commented 6 years ago

Would be nice to user multi threading for download as well, this will really speed up the process a lot. What do you think?

This is a one-time cost that you have to pay when mirroring the first time. It is possible that multiprocessing the download could make this process faster but I'm not sure where the bottleneck is.

ericdill commented 6 years ago

So I guess we'd be ok just catching ConnectionResetError in the download. If the disk is full you really don't want to catch that exception and try it again hah

gpcimino commented 6 years ago

Eric,

we had really hard time to complete the initial conda-forge download. The 2 major problems we encounter are:

I have put together a quick hack to see how difficult is to implement these features.

BTW, the script is based on reactive programming, it helps a lot to simplify the parallel execution with threads.

Let me know if you are interest to move towards this solution.

In the mean time I would pause this PR.

Thanks GP

ericdill commented 6 years ago

Hi @gpcimino --

If I understand correctly, you're proposing this workflow for conda-mirror:

  1. Figure out which packages need to be downloaded
  2. Remove any packages that we don't want and update repodata.json & repodata.json.bz2 in the target folder
  3. For each package that needs downloading, do the following: a. Download the package to a temp folder b. verify the downloaded package (and remove on failure) c. copy the downloaded package to the target folder d. Update repodata.json and repodata.json.bz2 in the target folder with the new package

I added 2d because it's an important step so that users don't get packages moved around under their feet when trying to run conda install.

Generally speaking I think that having the process of conda-mirror move towards this model is an improvement. If you agree that this is the right approach then let's see what a PR looks like that would effect these changes.

Regarding your second bullet "speed up the download using multi-threading", I am reluctant to agree to more parallelism in this project since it makes things more complicated. I think that would be something that is pretty straightforward for you to wrap around conda-mirror on your own infrastructure. I suspect the solution would reuse much of the code inside of conda-mirror but effectively be replacing main() with your site-specific needs. For what it's worth we also have a site-specific wrapper around conda_mirror that hooks in to our internal monitoring tooling here at Valassis Digital.

gpcimino commented 6 years ago

Eric,

did you check my script at

https://github.com/gpcimino/conda-repo/blob/master/conda_repo.py

It is already parallel in the md5 calculation and in the file download.

It is based on reactive extensions:

https://github.com/ReactiveX/RxPY

which make parallelism really easy!

Have a look!

parente commented 5 years ago

I'm going to close this PR due to it being stale and a lack of time to help see it through. If someone is still interested in adding retries to the code base along with tests, feel free to open for a review.

Related: There was some work done in #70 to prevent failures due to out-of-disk errors, but not network failures.