ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.2k stars 9.93k forks source link

«File name too long» should not error #29912

Open ffaf1 opened 3 years ago

ffaf1 commented 3 years ago

Checklist

Description

To reproduce:

My suggestion:

This will improve UX, check bug tracker to see how common this issue is.

7765 talks about this too (with a linked pull request) but:

dirkf commented 3 years ago

The general solution to this is a (new) FAQ entry

### My download fails because youtube-dl is trying to write to a file whose name is too long for the file system.

Use the `--get-filename` option to see what filename is being proposed. Then adjust, or add, the `-o`/`--output` option in your original youtube-dl command to make the filename shorter. For instance, if the extractor for the site sets a very long `title` field and you are using the default output template `%(title)s-%(id)s.%(ext)s`. you could try `-o %(title).50s-%(id)s.%(ext)s` instead to limit the length of the filename.
ffaf1 commented 3 years ago

But my ticket is addressing a different point. Shrinking/truncating long filenames should be default. It would be an immense step forward in terms of UX to go to «it just works, emits a warning» from «it does not work, errors out; search the error / skim the FAQ».

Of course, if the user has specific needs, they should be able to turn this off and get an error. (maybe via --fixup option).

Il 06 settembre 2021 alle 19:33 dirkf ha scritto:

The general solution to this is a FAQ entry

### My download fails because youtube-dl is trying to write to a file whose name is too long for the file system.

Use the `--get-filename` option to see what filename is being proposed. Then adjust, or add, the `-o`/`--output-template` option in your original youtube-dl command to make the filename shorter. For instance, if the extractor for the site sets a very long `title` field and you are using the default output template `%(title)s-%(id)s.%(ext)s`. you could try `-o %(title).50s-%(id)s.%(ext)s` instead to limit length of the filename.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/ytdl-org/youtube-dl/issues/29912#issuecomment-913945852

dirkf commented 3 years ago

The previous discussions on this point show that the maintainers (Sergey, at least) won't accept any solution that they see as ad hoc. The project sees the id field in particular as inviolable, and won't accept a solution that might accidentally truncate an id embedded in a filename. However, if the project's maintainers should be permanently absent, the project might be adopted by maintainers with different views.

It could have been reasonable to add some sort of filename truncation functionality to the --restrict-filenames option, where it has already been accepted to modify embedded fields, but the existing definition of that option doesn't allow it.

The cross-platform Python APIs don't offer a way of telling if the proposed file name is too long. In general this is a tricky issue because the error is only found once the OS kernel has called into the filesystem driver for the device that is to be written to.

We could try modifying sanitize_open() in utils.py, s/t like below:

def sanitize_open(filename, open_mode):
    """Try to open the given filename, and slightly tweak it if this fails.

    Attempts to open the given filename. If this fails, it tries to change
    the filename slightly, step by step, until it's either able to open it
    or it fails and raises a final exception, like the standard open()
    function.

    It returns the tuple (stream, definitive_file_name).
    """
    try:
        if filename == '-':
            if sys.platform == 'win32':
                import msvcrt
                msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
            return (sys.stdout.buffer if hasattr(sys.stdout, 'buffer') else sys.stdout, filename)
        stream = open(encodeFilename(filename), open_mode)
        return (stream, filename)
    except (IOError, OSError) as err:
        if err.errno in (errno.EACCES,):
            raise
        # --- addition ---
        # would go in `compat.py`
        try:
            from textwrap import shorten as compat_textwrap_shorten
        except ImportError:
            def compat_textwrap_shorten(
                    text, width, fix_sentence_endings=False, break_long_words=True,
                    break_on_hyphens=True, placeholder=' [...]'):
                import textwrap
                try_text = textwrap.wrap(
                    text, width,
                    fix_sentence_endings=fix_sentence_endings,
                    break_long_words=break_long_words,
                    break_on_hyphens=break_on_hyphens)
                if len(try_text) == 1:
                    return try_text[0]
                return textwrap.wrap(
                    text, width - len(placeholder),
                    fix_sentence_endings=fix_sentence_endings,
                    break_long_words=break_long_words,
                    break_on_hyphens=break_on_hyphens)[0]+placeholder

        # function would go in utils.py
        def reduce_filename(fname):
            ellipsis = '[...]'
            fname = os.path.split(fname)
            fname = fname[:1] + os.path.splitext(fname[1])
            fname[1] = remove_end(fname[1], ellipsis)
            flen = len(fname[1])
            if flen < 20:
                # give up
                return None
            # might try reduction by less than 2?
            fname[1] = compat_textwrap_shorten(fname[1], 1+flen/2, placeholder=ellipsis)
            return os.path.join(fname[0], ''.join(fname[1:]))

        alt_filename = filename
        while err.errno in (errno.ENAMETOOLONG,):
            alt_filename = reduce_filename(alt_filename)
            if not alt_filename:
                break
            try:
                stream = open(encodeFilename(alt_filename), open_mode)
                return (stream, alt_filename)
            except (IOError, OSError) as err:
                pass
        else:
            # not too long, but still wrong
            filename = alt_filename
        # --- end addition ---

        # In case of error, try to remove win32 forbidden chars
        alt_filename = sanitize_path(filename)
        if alt_filename == filename:
            raise
        else:
            # An exception here should be caught in the caller
            stream = open(encodeFilename(alt_filename), open_mode)
            return (stream, alt_filename)

Concerns here:

There might also be additional guidance for developers to avoid extracting title and id fields that might exceed typical filename lengths.

galileo-pkm commented 2 years ago

This is particularly annoying when downloading videos from Twitter as it takes the whole post as file name. I usually use --id and rename the file manually.

dirkf commented 2 years ago

That's really a bug in the Twitter extractor, IMO.

But feel free to apply the patch from the PR linked above and see if it does what you want.

rvolgers commented 2 years ago

I don't see any other reasonable solution for the Twitter extractor. Maybe it could pre-truncate the post? That's not any better than just doing it later when constructing the output file name though.

I've run into this issue many times, as I usually want to use youtube-dl to quickly download a video to my downloads folder and I don't really care about the name but would prefer it give as much information as possible (so I don't want to just default to using the id).

I would really, really love it if somebody made a youtube-dl flag --just-download-it-seriously-i-dont-care-about-the-details-just-do-your-best-i-believe-in-you, bonus points if that's the default, but even just having it available so I can write a wrapper script or bash alias that uses it would be amazing.

dirkf commented 2 years ago

The defaults really do work in the vast majority of cases.

You can write a config file with your preferred options and select it with --config-location in your script if some default setting should not be satisfactory (or just make it your default config).

For instance, you could put -o %(title).50s-%(id)s.%(ext)s to avoid downloads being broken by unreasonably long titles.

If anyone has a suggestion for what (reliably concise) metadata item could be used as the title for media extracted from Twitter, we could make a PR for Twitter specifically.

palewire commented 2 years ago

I've just tonight run into this bug with the Twitter extractor. You can reproduce it with the following:

youtube-dl https://twitter.com/TulsiGabbard/status/1555878318469091330
[twitter] 1555878318469091330: Downloading guest token
[twitter] 1555878318469091330: Downloading JSON metadata
[twitter] 1555878318469091330: Downloading m3u8 information
ERROR: unable to open for writing: [Errno 36] File name too long: 'Tulsi Gabbard 🌺 - Puberty-blocking procedures promoted by the Biden_Harris Admin are child abuse. The FDA has recently confirmed these hormones_drugs have extremely dangerous side effects, like brain swelling and vision loss.-1555878318469091330.mp4.part'

Would you be open to a Twitter specific workaround patch? Perhaps the title could be changed to a combination of the uploader name and date, with the tweet text moved to the description field?

bryanpaget commented 1 year ago

I am also running into this issue.

rautamiekka commented 1 year ago

What is this? The 90s? This shouldn't be an error at all.

Depends on who/what you blame.

dirkf commented 1 year ago

@bryanpaget, you might like to reveal more details of how you "[ran] into this issue", or is it just Twitter again?

Also, feel free to review the discussion in PR #29989 to understand why this is a difficult issue to fix retrospectively (other than by following the approach in https://github.com/ytdl-org/youtube-dl/issues/29912#issuecomment-913945852), and possibly contribute to the design of a general solution.

Vangelis66 commented 1 year ago

The forked project, yt-dlp, has implemented the following option:

 --trim-filenames LENGTH         Limit the filename length (excluding
                                 extension) to the specified number of
                                 characters

... By setting a value of ca. 180, default proposed Twitter filenames should be made compatible with a max-255-char filesystem:

yt-dlp -f http-632 "https://twitter.com/TulsiGabbard/status/1555878318469091330" => 

[twitter] 1555878318469091330: Downloading guest token
[twitter] 1555878318469091330: Downloading JSON metadata
[twitter] 1555878318469091330: Downloading m3u8 information
[info] 1555878318469091330: Downloading 1 format(s): http-632
ERROR: unable to open for writing: [Errno 2] No such file or directory: 'Tulsi Gabbard ?? - Puberty-blocking procedures promoted by the Biden?Harris Admin are child abuse. The FDA has recently confirmed these hormones?drugs have extremely dangerous side effects, like brain swelling and vision loss. [1555878318469091330].mp4.part'

... whereas:

yt-dlp -f http-632 "https://twitter.com/TulsiGabbard/status/1555878318469091330" --trim-filenames 185 => 

twitter] 1555878318469091330: Downloading guest token
[twitter] 1555878318469091330: Downloading JSON metadata
[twitter] 1555878318469091330: Downloading m3u8 information
[info] 1555878318469091330: Downloading 1 format(s): http-632
[download] Destination: Tulsi Gabbard ?? - Puberty-blocking procedures promoted by the Biden?Harris Admin are child abuse. The FDA has recently confirmed these hormones?drugs have extremely dangerous side effec. [1555878318469091330].mp4
[download] 100% of 2.35MiB in    00:02 at 863.56KiB/s

In the case of Twitter, id is NOT being truncated, which is a plus...

dirkf commented 1 year ago

I don't see that --trim-filenames ... is much of an improvement on setting an explicit length for a problem field (title, I'm looking at you) in the output template.

A good solution would:

Clearly these are not completely consistent requirements, as an output template might be, say, %(id)16777216s.%(ext)s. Some prioritisation is necessary.

I still hold that the excessive filename length provoked by Twitter is an extractor bug.

pukkandan commented 1 year ago

--trim-filenames is a poorly implemented solution from youtube-dlc and I would not recommend porting it as-is. Even in yt-dlp, I recommend users to use the % formatting to control filename length instead. E.g. https://github.com/yt-dlp/yt-dlp/issues/3494

In the case of Twitter, id is NOT being truncated, which is a plus...

This is actually a bug and shows how poorly implemented the option is! See https://github.com/yt-dlp/yt-dlp/issues/2314. Do not rely on this behavior...

paboum commented 1 year ago

Geez...

ERROR: unable to open for writing: [Errno 36] File name too long: 'Behind the scenes FOOTage 🎥🦶😅 • • • #epicfail #fail #fails #failure #cheer #cheerleading #cheerup #partnerstunt #growthmindset #growth #fyp #funny #funnyvideos #funnymemes #stunt _ Zach Reyes _ M.I.A. · Paper Planes-794721471829128.f792097348668973v-1.mp4.part'

It's not Twitter-specific after all. Just trim the filename length to 255 chars unless a specific parameter is provided by the user to override this limit.

dirkf commented 1 year ago

As before, see https://github.com/ytdl-org/youtube-dl/issues/29912#issuecomment-1242957697 and https://github.com/ytdl-org/youtube-dl/issues/29912#issuecomment-1243596208 for why this doesn't happen, at least not as you suggest and not yet.

lassik commented 1 year ago

Error message seems to come from this part of the code.

Could probably catch errno.ENAMETOOLONG and trim the filename until it fits.

dirkf commented 1 year ago

Indeed, as explained by the immediately preceding comment and its linked comments.

estatistics commented 1 year ago

the first one --trim-filenames 90 didnt caught all filenames, the second one -o "%(title).90s-%(id)s.%(ext)s" workt for tiktok producing filenames long 90 characters plus ID name (id is extra characters).

dirkf commented 1 year ago

30839: "if you were actually running yt-dlp ..."

Also, 90 isn't a "constant": pick a number that works.

paboum commented 1 year ago

In C programs, a library limits.h is available on POSIX systems, including vars like PATH_MAX denoting the uniform system limit for the length of an entire file name. The same should be available in Python somehow, just use it to make it work, default behavior should be safe - https://en.wikipedia.org/wiki/Best-effort_delivery

rautamiekka commented 1 year ago

Some fields are more important than others, like the unique ID's, hashes, and file extensions can never be allowed to be truncated. As few as possible known fields could have an importance priority floating point number internally assigned in the extractor.

^ Then, generally, you can truncate all other fields equally to fit, as it'll be more important to have some knowledge of the contents in the filename than have none at all, and 125, not to mention 255, chars per segment (or even a full path) is a lotta text. If the User is hellbent on changing a sensible default into something very much non-sensible like ID + hash + keywords/tags + collection name + description, that's their fault.

Something like that.

dirkf commented 1 year ago

I think https://github.com/ytdl-org/youtube-dl/issues/29912#issuecomment-1243596208 is a reasonable statement of the problem (further suggestions welcome). Unfortunately we haven't been able to make any progress in creating an implementation to meet those requirements, or even in refining the problem to make it more tractable.

ghost commented 1 year ago

personally I think its good for a project to have goals, and as important if not more important, non goals. the current -o option is powerful enough as is, that users can solve this problem on their own, without the need for any magic code on the part of YouTube-DL. I dont think this project should be in the business is trying to figure out what people want. the dumber the better, because every line of code committed has to be maintained, and last I checked we only have one or two maintainers, for a project with 100k stars and no financial backing. but at the end of the day its not my decision, just giving my two cents.

dirkf commented 1 year ago

Plainly it's not a showstopping issue but it is one that generates repeated complaints. It's still open in case someone is eventually intrigued enough to work out a solution that meets an adequate subset of the problem constraints, after spending however long it takes locked in the basement with the necessary cold towels wrapped round the head. But we should fix Twitter/X if the extractor is ever viable and updated post-Musk.

jgowdy commented 4 weeks ago

Yeah, having the tool just fail on long file names without the user applying different parameters to mitigate the issue is very rational and an awesome user experience. I mean just defaulting to a template that causes video downloads not to fail with reasonable truncation would be ridiculous. The legions of people coming to the Issues and hopefully reading the FAQ before creating this issue repeatedly shows the wisdom of this decision. 👍

paboum commented 3 weeks ago

What is wrong with following the best-effort rule and utilizing limits.h or similar library if available on some systems?

dirkf commented 3 weeks ago

When we see your PR implementing that idea and explaining how it meets (or not) the outline requirements above, we'll be able say what, if anything, is wrong with it.

paboum commented 3 weeks ago

Perhaps you shouldn't maintain this project if you cant "see" a simple idea.

https://github.com/ytdl-org/youtube-dl/pull/32910

Obviously this algorithm is sub-optimal and poorly implemented, but you should "see" what is missing in your code now.

Just use MAX_LENGTH to either:

dirkf commented 3 weeks ago

We know how to detect a filename that is too long; the question is how to truncate it in a way that meets the linked requirements as much as possible, when the output template mechanism means that the user can't necessarily know how long the constructed filename will be, and yet almost certainly doesn't want the download to fail if the filename is too long. The proposal doesn't address the hard part of the problem, unlike the not completely successful attempt in #29989.

paboum commented 3 weeks ago

What is the current estimate of closing the ticket? What is blocking you from applying whatever simple heuristic like "truncating to 255 always" to just stop failing in most use cases? What knowledge are you missing and where do you expect to gather it, from whom, and when? What is your plan to stop more tickets being created?

PS. From the wall of text you linked I understand, that people who use --output already (my guess is 0.01% of userbase) would suffer from lower-level truncation, so an option should be provided for them to (optionally) disengage it.

dirkf commented 3 weeks ago

It works the other way. Just set maximum lengths for the fields in the output template that will meet the limits of your OS/filesystem, as well as any others to which you might copy the downloaded file. You should do this for the documented default (set in a config file), and for any specific output template that you pass in --output/-o .... Then any mutilation of template fields is entirely under your control.

The combined abilities of the many contributors to this project and yt-dlp have not so far found an acceptable solution to the general problem posed by this issue. I don't expect the issue to be closed soon or ever, but who knows?

paboum commented 3 weeks ago

Don't forget to also add a available_ram and available_disk_space fields to the config file. The user chooses to download a big video? Too bad, should have chosen a smaller file or format.