mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.87k stars 888 forks source link

Reddit downloads can still fail due to path length even though it should be getting truncated #873

Closed shinji257 closed 4 years ago

shinji257 commented 4 years ago

As subject suggests. There is a temp file downloaded and when that file is saved it is too long. So even if the resulting file would have been proper it still fails the whole process in the end.

./gallery-dl/r/araragi/reddit/araragi/gm7ud5 I have almost finished the anime and I have already decided that I want to buy all the manga, and possibly also the light novel. Here are the first 2 volumes of bakemonogatari! Ordered about 1 week ago they have already arrived, I'm very happ.jpg
[download][warning] OSError: [Errno 36] File name too long: "./gallery-dl/r/araragi/reddit/araragi/gm7ud5 I have almost finished the anime and I have already decided that I want to buy all the manga, and possibly also the light novel. Here are the first 2 volumes of bakemonogatari! Ordered about 1 week ago they have
 already arrived, I'm very happ.jpg.part"
[download][error] Failed to download gm7ud5 I have almost finished the anime and I have already decided that I want to buy all the manga, and possibly also the light novel. Here are the first 2 volumes of bakemonogatari! Ordered about 1 week ago they have already arrived, I'm very happ.jpg

Example posting: https://www.reddit.com/r/araragi/comments/gm7ud5/i_have_almost_finished_the_anime_and_i_have/

OS release is Ubuntu 19.10 running on Proxmox as a container.

EDIT: I worked around this by disabling the creation of part files in the downloader however there should be a check to see if it could even be saved and skipped if it can't. I actually tried to move it to /tmp and the file was still going to be too long.

shinji257 commented 4 years ago

This one is reddit and still fails even though the temp files are not present. Seems youtube-dl might not be getting a shortened filename to use as temporary files.

Example: https://www.reddit.com/r/CatastrophicFailure/comments/g2rdwq/on_may_5_2019_27_minutes_after_takeoff_a_plane/

[downloader.ytdl][error] ERROR: unable to open for writing: [Errno 36] File name too long: '/home/shinji/gallery-dl/r/CatastrophicFailure/reddit/CatastrophicFailure/g2rdwq On May 5, 2019, 27 minutes after take-off, a plane was forced to return to Sheremetyevo airport due to technical problems on board. During the landing, the airliner suffered damage that caused a fire, which caused the plane to partially burn .fdash-VIDEO-1.mp4'
[download][error] Failed to download g2rdwq On May 5, 2019, 27 minutes after take-off, a plane was forced to return to Sheremetyevo airport due to technical problems on board. During the landing, the airliner suffered damage that caused a fire, which caused the plane to partially burn .mp4
mikf commented 4 years ago

I've reduced the {title} length in the default filename format string by quite a bit more to at least get your two examples to work (https://github.com/mikf/gallery-dl/commit/94a08f0bcbfa1b0c456071630e3a97217bd1410a), but this'll be too long again for any non-ASCII titles with enough characters. ({title[:220]} limits the length to 220 characters, and not bytes)

There is no good general solution for the "filename length problem", which is why I haven't really tried to implement something. You can find a workable solution for filesystems using UTF-8 in #814, but that doesn't work for filesystems with a different filename encoding or filename length restriction.

shinji257 commented 4 years ago

Thanks for doing this but I don't want you to have to keep doing this. I've hit into a couple more with imgur for whatever reason. The file length alone hit over 270 characters. Don't ask me why. Forgive me for asking but how exactly do I override the formatting used by the extractor so that I might be able to impose a per extractor limit?

mikf commented 4 years ago

filename and directory options in your config file.

A minimal config for Reddit with the current default filename/directory format strings would look something like

{
    "extractor": {
        "reddit": {
            "filename": "{id} {title[:220]}.{extension}",
            "directory": ["{category}", "{subreddit}"]
        }
    }
}
shinji257 commented 4 years ago

Side note on a closed ticket. Trying to limit imgur filenames.

Can you tell me what ?// does in the format string for it? Original format string: `{category}{id}{title:?_//}.{extension}`

To me it just seems to be adding a _ in front of title however I can't determine if it does anything else and I'm hoping you can provide some insight?

Hrxn commented 4 years ago

This is a conditional expression, i.e. if title is present (or not "None" for example), it will add an _ in front of the title field.

https://github.com/mikf/gallery-dl/blob/b62ea725332ae04df2c3ebb1b3b379297027f164/gallery_dl/util.py#L496-L503

shinji257 commented 4 years ago

Ok. With that said is there a way to include that and limit the length at the same time? I can only seem to apply one or the other.

mikf commented 3 years ago

{title[:160]:?_//} for example