iTaybb / pySmartDL

A Smart Download Manager for Python
The Unlicense
198 stars 56 forks source link

ConnectionResetError(104, 'Connection reset by peer') causes re-download from scratch #32

Closed pete0877 closed 5 years ago

pete0877 commented 5 years ago

First of all: great lib @iTaybb -- Thank you for putting it together.

I'm dealing with large files (~17Gb) being transferred from Azure CDN to Amazon EC2. The internet connection is stable I think but I'm noticing an issue that's causing a lot of pain for me:

When the host from which the files are getting downloaded resets connection, the SmartDL client just gives up and restarts the download from scratch. This of course is a big issue when I'm trying to get such large files. Below is sample log & code.

This is somewhat related to issue #19 but not entirely since Connection reset by peer is technically not lack of internet connection but rather a standard way HTTP endpoints communicate.

Seems to me SmartDL should be able to deal with connection reset and just ask to re-download the http chunk / range it was working on at that time. Thoughts?

CODE:

def download_url(self, url, location):
    downloader = SmartDL(url, location, progress_bar=False, fix_urls=False, threads=10, logger=logger)
    downloader.timeout = 20
    downloader.attemps_limit = 10
    downloader.minChunkFile = 64 * (1024 ** 2)  # 64 MB

    try:
        downloader.start(blocking=False)

        while not downloader.isFinished():
            logger.info('Upload progress: SIZE: {} {} SPEED: {} ETA: {}'.format(
                "{0:.1f}%".format(downloader.get_progress() * 100),
                downloader.get_dl_size(human=True),
                downloader.get_speed(human=True),
                downloader.get_eta(human=True),
            ), url=url, location=location, status=downloader.get_status())
            time.sleep(5)

        if downloader.isSuccessful():
            logger.info('Upload completed',
                        url=url,
                        location=downloader.get_dest(),
                        duration=downloader.get_dl_time(human=True))
        else:
            logger.error('Upload error',
                         url=url,
                         location=downloader.get_dest(),
                         errors=downloader.get_errors())

            errors_text = ''
            for e in downloader.get_errors():
                errors_text = errors_text + "\n" + str(e)

            raise Exception(errors_text)
    except Exception as error:
        file_list = glob.glob('{}*'.format(location))
        for file_path in file_list:
            try:
                os.remove(file_path)
            except:
                logger.error('Error while deleting file', file_path=file_path)
        raise error

LOG:

"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 88.0% 15.06 GB SPEED: 7.1 MB/s ETA: 4 minutes, 50 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:20.608056Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 88.2% 15.09 GB SPEED: 7.1 MB/s ETA: 4 minutes, 57 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:25.626230Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 88.4% 15.12 GB SPEED: 7.8 MB/s ETA: 4 minutes, 58 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:30.666378Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 88.6% 15.15 GB SPEED: 7.5 MB/s ETA: 4 minutes, 39 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:35.681668Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 88.8% 15.19 GB SPEED: 7.4 MB/s ETA: 4 minutes, 21 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:40.713161Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 89.0% 15.22 GB SPEED: 8.2 MB/s ETA: 4 minutes, 23 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:45.718700Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 89.2% 15.25 GB SPEED: 4.8 MB/s ETA: 5 minutes, 1 second", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:50.724313Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 89.3% 15.27 GB SPEED: 4.9 MB/s ETA: 6 minutes, 23 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:08:55.729961Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 89.5% 15.30 GB SPEED: 5.5 MB/s ETA: 5 minutes, 54 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:00.735431Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 89.6% 15.32 GB SPEED: 3.6 MB/s ETA: 7 minutes, 59 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:05.740615Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 89.7% 15.34 GB SPEED: 3.9 MB/s ETA: 7 minutes, 56 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:10.774994Z"}
{"event": "ConnectionResetError(104, 'Connection reset by peer')", "level": "exception", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.287198Z"}
NoneType
{"event": "Starting a new SmartDL operation.", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.287604Z"}
{"event": "One URL is loaded.", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.287741Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.287870Z"}
{"event": "Content-Length is 18360254880 (17.10 GB).", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.615813Z"}
{"event": "Launching 10 threads (downloads 1.71 GB/thread).", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.616141Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.000'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.617045Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.001'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.617507Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.002'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.617891Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.003'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.618336Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.004'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.618734Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.005'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.619121Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.006'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.657019Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.007'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.657721Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.008'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.658263Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b.009'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.659037Z"}
{"event": "Control thread has been started.", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.676908Z"}
{"event": "Diff between downloaded files and expected filesizes is 1054262929.0kB.", "level": "warning", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.677279Z"}
{"event": "Starting a new SmartDL operation.", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.677813Z"}
{"event": "One URL is loaded.", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.677947Z"}
{"event": "Downloading 'http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014' to '/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b'...", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.678064Z"}
{"event": "Content-Length is 18360254880 (17.10 GB).", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.760694Z"}
{"event": "Launching 10 threads (downloads 1.71 GB/thread).", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:12.761018Z"}
{"event": "Control thread has been started.", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:13.558547Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 0.2% 27.2 MB SPEED: 17.0 MB/s ETA: 0 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:15.931493Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 0.6% 108.5 MB SPEED: 21.1 MB/s ETA: 14 minutes, 36 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:20.949592Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 1.1% 197.7 MB SPEED: 20.5 MB/s ETA: 14 minutes, 11 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:25.958432Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 1.6% 286.0 MB SPEED: 19.7 MB/s ETA: 14 minutes, 22 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:30.994674Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 2.1% 373.6 MB SPEED: 21.1 MB/s ETA: 14 minutes, 44 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:36.071302Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 2.7% 464.9 MB SPEED: 20.4 MB/s ETA: 14 minutes, 2 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:41.314663Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 3.1% 548.0 MB SPEED: 19.3 MB/s ETA: 15 minutes, 29 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:46.325928Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 3.6% 631.5 MB SPEED: 18.4 MB/s ETA: 15 minutes, 5 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:51.337558Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 4.1% 718.0 MB SPEED: 20.0 MB/s ETA: 14 minutes, 28 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:09:56.537931Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 4.6% 808.3 MB SPEED: 21.2 MB/s ETA: 13 minutes, 51 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:10:01.544831Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 5.1% 892.5 MB SPEED: 19.2 MB/s ETA: 14 minutes, 2 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014", "level": "info", "logger": "content.the_client", "timestamp": "2019-06-22T19:10:06.578321Z"}
{"location": "/tmp/3099cbe6-3496-4e64-9d9e-4aea8a90fe4b", "event": "Upload progress: SIZE: 6.0% 1.03 GB SPEED: 36.9 MB/s ETA: 8 minutes, 57 seconds", "status": "downloading", "url": "http://HOST-OBFUSCATED/2018/ACC/2e9046c1564bb85f46fd97d3ab2c07f6adcbb508.mp4?sv=2014-02-14&sr=b&sig=6iaaT3ZKnuLUbfgDyBi57i0p8AR6el3e%2BKhEM0nw1uI%3D&se=2019-06
iTaybb commented 5 years ago

I believe that's the same issue as #19 and #14. I've already started to implement it on a side branch (resume-stopped-downloads), but it's not working yet. I hope i'll have time in the next few days to address this feature.

pete0877 commented 5 years ago

Awesome - thank you. I checked out the resume-stopped-downloads but noticed that it it creates really big part files .. I suspect it doesn't currently calculate from which point the download needs to place based on existing file size. Good luck with that and I'll watch the repo. Thank you again!!