--no-continue behavior seems different from description in documentation

sharinq commented 5 years ago

Checklist

[ ] I'm reporting a broken site support issue
[x] I've verified that I'm running youtube-dl version 2019.06.08
[ ] I've checked that all provided URLs are alive and playable in a browser
[ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
[x] I've searched the bugtracker for similar bug reports including closed ones
[x] I've read bugs section in FAQ

Verbose log

Description

This is the description of --no-continue in the documentation:

--no-continue                    Do not resume partially downloaded files
                             (restart from beginning)

When I interrupt a download, it leaves a .part file and a .ytdl file. Based on the description of --no-continue, I would assume that using this option and starting the download process again with the same URL would start the download again from 0% ("restart from the beginning"), however it resumes from the percentage it was at when it was interrupted.

I'm not sure if this is an issue with the --no-continue functionality or with the description in the documentation.

SebiderSushi commented 5 years ago

youtube-dl clearly should restart from the beginning if --no-continue is provided and also did for me in a short test. But only on a single file download without any .ytdl file nearby. I was able to reproduce your observation on a fragmented download, during which youtube-dl also created a .ytdl file. In my case that file only contained the following: {"downloader": {"current_fragment": {"index": 2}}} If said file is present on a subsequent run of youtube-dl the download seems to be resumed at the stored fragment index and the progress indicator also starts right where it left off. If the .ytdl file is removed before another run of youtube-dl --no-continue the download progress indicator starts over at 0% and in a file manager a $filename.mp4.part-Frag0.part file can be found growing.

It seems as if the logic behind --no-continue adapts to the fragmentation and maybe only resumes partially downloaded fragments? Although it is probably correct, some clarification on this and a mention at least somewhere in the documentation would really be helpful to avoid misunderstandings arising from different conceptions of "restart from the beginning".

Verbose log

Below is the output of youtube-dl run twice, the second run cancelled after 2-3 seconds, in which it is definitely impossible to reach 4.5% of this download with my internet connection.

home@home:~$ youtube-dl --no-continue -v https://www.sat1.de/tv/meine-klasse-voll-das-leben/video/25-staffel-2-episode-5-abgehauen-ganze-folge
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'https://www.sat1.de/tv/meine-klasse-voll-das-leben/video/25-staffel-2-episode-5-abgehauen-ganze-folge', u'--no-continue', u'-v']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.06.21
[debug] Python version 2.7.12 (CPython) - Linux-4.15.0-52-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.15, ffprobe 2.8.15, rtmpdump 2.4
[debug] Proxy map: {}
[debug] Using fake IP 53.137.246.36 (DE) as X-Forwarded-For.
[prosiebensat1] tv/meine-klasse-voll-das-leben/video/25-staffel-2-episode-5-abgehauen-ganze-folge: Downloading webpage
[prosiebensat1] 5881823: Downloading videos JSON
[prosiebensat1] 5881823: Downloading protocols JSON
[prosiebensat1] 5881823: Downloading urls JSON
[prosiebensat1] 5881823: Downloading MPD manifest
[prosiebensat1] 5881823: Downloading m3u8 information
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on u'http://vas-v4.p7s1video.net/4.0/playlist.m3u8?x=01&y=X1bZ21kI1QjOpvVstZ9w14xoFiwILOdaFkvz4yPGsSlDJrdUfPv5Ve_5zJhZ4jhaMwTCFWitQLdA0OIFg7c-oBkCXkk8nsEcJ4RRc5_wKbR32HvuDHkRfrSLyCFsDu0N_x2m4zwFetNJ7q6IUsICW1JL5cTWPe5wsmX9l4amO9Zhpuh4Lzu5VsXOU4IP1VrzemdZeF6584H-ZfCWMwa6mI4sgz3XBuW1aUfQewXR552Ea8iQw-21Gsc8WCLLN_8VAB_RbvJqi6r09hr9tW_19bFms0n8-rgJBoD8E24o4SNNSoRyfEoEsXZpm31d9G7Zx_NRTHEjGswL_2NZ0fIvvE04u2z6XjY92ba40E8W1UeYOqSOxnfhb-S6oK-VHi-9'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 132
[download] Destination: Meine Klasse - Voll das Leben- Staffel 2 Episode 5 - Abgehauen-5881823.fhls-4516.mp4
[download]   4.3% of ~723.99MiB at 773.69KiB/s ETA 02:43^C
ERROR: Interrupted by user

home@home:~$ youtube-dl --no-continue -v https://www.sat1.de/tv/meine-klasse-voll-das-leben/video/25-staffel-2-episode-5-abgehauen-ganze-folge
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'https://www.sat1.de/tv/meine-klasse-voll-das-leben/video/25-staffel-2-episode-5-abgehauen-ganze-folge', u'--no-continue', u'-v']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.06.21
[debug] Python version 2.7.12 (CPython) - Linux-4.15.0-52-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.15, ffprobe 2.8.15, rtmpdump 2.4
[debug] Proxy map: {}
[debug] Using fake IP 53.43.128.40 (DE) as X-Forwarded-For.
[prosiebensat1] tv/meine-klasse-voll-das-leben/video/25-staffel-2-episode-5-abgehauen-ganze-folge: Downloading webpage
[prosiebensat1] 5881823: Downloading videos JSON
[prosiebensat1] 5881823: Downloading protocols JSON
[prosiebensat1] 5881823: Downloading urls JSON
[prosiebensat1] 5881823: Downloading MPD manifest
[prosiebensat1] 5881823: Downloading m3u8 information
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on u'http://vas-v4.p7s1video.net/4.0/playlist.m3u8?x=01&y=34fz5x6DGEFcDzA9HOmjHUSZunRCR0Mx_5R7Dxpqw8IeieS35zrhJYT-HzNT-1Sjk_8HHmeof5FFDXzWPqnc5NhY4ywNAJNzjUt0DE7-gDUDO9OrqAbvlzyCHqX0vmiA3GLzEVlA-ZoKF_pN7urWWQ7Ef_dTh48AkTAuhVg11va682yAs2w_31mQv3R-HOujo2N94XSBD9TJbbXdXh0CJZXYKDXqViMrARU2QgpoAJm6plfIwV9CavSu0yvSgMjwy1O0plxnITGIRZXKFWaiDarl3f6QTN03qyCzdwsxH94gKkmUq0hIHcZ-WxYtmggUnCdFaWmIFR1g_ChywTzkSq8y-D8luRKOZxbyKYapDDn3_xeGXoBvdj64F-SjguNB'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 132
[download] Destination: Meine Klasse - Voll das Leben- Staffel 2 Episode 5 - Abgehauen-5881823.fhls-4516.mp4
[download]   4.5% of ~723.99MiB at 739.43KiB/s ETA 00:33^C
ERROR: Interrupted by user

pukkandan commented 1 year ago

During the early days of yt-dlp, I too assumed this was a documentation issue (https://github.com/yt-dlp/yt-dlp/issues/46) and "fixed" yt-dlp's docs accordingly. But now having better understanding of ytdl's internals, I think this is actually an implementation bug instead. Considering the original authors of the option are no longer active, we can only go by how we think the option "should" work. What do you think @dirkf ?

dirkf commented 1 year ago

If the manual is the spec, then I can't understand why --no-continue wouldn't start from 0%.

Presumably this was overlooked when fragment downloading was added.

Do not resume partially downloaded files (restart from beginning)

No sensible interpretation of this user-facing help text would have "files" referring to temporary fragment files. Obvs the media (etc) file being downloaded is the target.

Probably blame could shed more light. At any rate, for HTTP downloads the resume length is calculated if continuedl is set, and it always is when called from the fragment downloader. Maybe in _prepare_frag_download():

         if self.__do_ytdl_file(ctx):
-            if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
+            if (self.params.get('continuedl', True)
+                and os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename'])))):
                 self._read_ytdl_file(ctx)

dirkf commented 1 year ago

That wasn't enough

Additionally, the ETA is being calculated incorrectly and a "continued" fragment download is (can be?) incorrect.

This seems to be better. The handling of the resumed fragment was missing. Also, add new filesize_or_none() method to FileDownloader and use existing calc_speed() in FragmentFD.

--- old/youtube_dl/downloader/common.py
+++ new/youtube_dl/downloader/common.py
@@ -123,6 +123,12 @@ class FileDownloader(object):
     def format_retries(retries):
         return 'inf' if retries == float('inf') else '%.0f' % retries

+    @staticmethod
+    def filesize_or_none(unencoded_filename):
+        fn = encodeFilename(unencoded_filename)
+        if os.path.isfile(fn):
+            return os.path.getsize(fn)
+
     @staticmethod
     def best_block_size(elapsed_time, bytes):
         new_min = max(bytes / 2.0, 1.0)

--- old/youtube_dl/downloader/fragment.py
+++ new/youtube_dl/downloader/fragment.py
@@ -71,7 +71,7 @@ class FragmentFD(FileDownloader):

     @staticmethod
     def __do_ytdl_file(ctx):
-        return not ctx['live'] and not ctx['tmpfilename'] == '-'
+        return ctx['live'] is not True and ctx['tmpfilename'] != '-'

     def _read_ytdl_file(self, ctx):
         assert 'ytdl_corrupt' not in ctx
@@ -101,6 +101,13 @@ class FragmentFD(FileDownloader):
             'url': frag_url,
             'http_headers': headers or info_dict.get('http_headers'),
         }
+        frag_resume_len = 0
+        if ctx['dl'].params.get('continuedl', True):
+            frag_resume_len = self.filesize_or_none(
+                self.temp_name(fragment_filename))
+        fragment_info_dict['_resume_len'] = frag_resume_len
+        ctx['prev_frag_downloaded_bytes'] = frag_resume_len or 0
+
         success = ctx['dl'].download(fragment_filename, fragment_info_dict)
         if not success:
             return False, None
@@ -124,9 +131,7 @@ class FragmentFD(FileDownloader):
             del ctx['fragment_filename_sanitized']

     def _prepare_frag_download(self, ctx):
-        if 'live' not in ctx:
-            ctx['live'] = False
-        if not ctx['live']:
+        if not ctx.setdefault('live', False):
             total_frags_str = '%d' % ctx['total_frags']
             ad_frags = ctx.get('ad_frags', 0)
             if ad_frags:
@@ -136,10 +141,11 @@ class FragmentFD(FileDownloader):
         self.to_screen(
             '[%s] Total fragments: %s' % (self.FD_NAME, total_frags_str))
         self.report_destination(ctx['filename'])
+        continuedl = self.params.get('continuedl', True)
         dl = HttpQuietDownloader(
             self.ydl,
             {
-                'continuedl': True,
+                'continuedl': continuedl,
                 'quiet': True,
                 'noprogress': True,
                 'ratelimit': self.params.get('ratelimit'),
@@ -150,12 +156,11 @@ class FragmentFD(FileDownloader):
         )
         tmpfilename = self.temp_name(ctx['filename'])
         open_mode = 'wb'
-        resume_len = 0

         # Establish possible resume length
-        if os.path.isfile(encodeFilename(tmpfilename)):
+        resume_len = self.filesize_or_none(tmpfilename) or 0 
+        if resume_len > 0:
             open_mode = 'ab'
-            resume_len = os.path.getsize(encodeFilename(tmpfilename))

         # Should be initialized before ytdl file check
         ctx.update({
@@ -164,7 +169,8 @@ class FragmentFD(FileDownloader):
         })

         if self.__do_ytdl_file(ctx):
-            if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
+            ytdl_file_exists = os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename'])))
+            if continuedl and ytdl_file_exists:
                 self._read_ytdl_file(ctx)
                 is_corrupt = ctx.get('ytdl_corrupt') is True
                 is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0
@@ -178,7 +184,12 @@ class FragmentFD(FileDownloader):
                     if 'ytdl_corrupt' in ctx:
                         del ctx['ytdl_corrupt']
                     self._write_ytdl_file(ctx)
+
             else:
+                if not continuedl:
+                    if ytdl_file_exists:
+                        self._read_ytdl_file(ctx)
+                    ctx['fragment_index'] = resume_len = 0
                 self._write_ytdl_file(ctx)
                 assert ctx['fragment_index'] == 0

@@ -209,6 +220,7 @@ class FragmentFD(FileDownloader):
         start = time.time()
         ctx.update({
             'started': start,
+            'fragment_started': start,
             # Amount of fragment's bytes downloaded by the time of the previous
             # frag progress hook invocation
             'prev_frag_downloaded_bytes': 0,
@@ -218,6 +230,9 @@ class FragmentFD(FileDownloader):
             if s['status'] not in ('downloading', 'finished'):
                 return

+            if not total_frags and ctx.get('fragment_count'):
+                state['fragment_count'] = ctx['fragment_count']
+
             time_now = time.time()
             state['elapsed'] = time_now - start
             frag_total_bytes = s.get('total_bytes') or 0
@@ -232,6 +247,9 @@ class FragmentFD(FileDownloader):
                 ctx['fragment_index'] = state['fragment_index']
                 state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
                 ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
+                ctx['speed'] = state['speed'] = self.calc_speed(
+                    ctx['fragment_started'], time_now, frag_total_bytes)
+                ctx['fragment_started'] = time.time()
                 ctx['prev_frag_downloaded_bytes'] = 0
             else:
                 frag_downloaded_bytes = s['downloaded_bytes']
@@ -240,8 +258,8 @@ class FragmentFD(FileDownloader):
                     state['eta'] = self.calc_eta(
                         start, time_now, estimated_size - resume_len,
                         state['downloaded_bytes'] - resume_len)
-                state['speed'] = s.get('speed') or ctx.get('speed')
-                ctx['speed'] = state['speed']
+                ctx['speed'] = state['speed'] = self.calc_speed(
+                    ctx['fragment_started'], time_now, frag_downloaded_bytes)
                 ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
             self._hook_progress(state)

@@ -268,7 +286,7 @@ class FragmentFD(FileDownloader):
                         os.utime(ctx['filename'], (time.time(), filetime))
                     except Exception:
                         pass
-            downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename']))
+            downloaded_bytes = self.filesize_or_none(ctx['filename']) or 0

         self._hook_progress({
             'downloaded_bytes': downloaded_bytes,

--- old/youtube_dl/downloader/http.py
+++ new/youtube_dl/downloader/http.py
@@ -58,9 +58,9 @@ class HttpFD(FileDownloader):

         if self.params.get('continuedl', True):
             # Establish possible resume length
-            if os.path.isfile(encodeFilename(ctx.tmpfilename)):
-                ctx.resume_len = os.path.getsize(
-                    encodeFilename(ctx.tmpfilename))
+            ctx.resume_len = info_dict.get('_resume_len')
+            if ctx.resume_len is None:
+                ctx.resume_len = self.filesize_or_none(ctx.tmpfilename) or 0

         ctx.is_resume = ctx.resume_len > 0

ytdl-org / youtube-dl