mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.29k stars 921 forks source link

File name length unchecked #814

Open adlerosn opened 4 years ago

adlerosn commented 4 years ago

The downloaded file name should never be longer than 255 bytes under Linux.

Take the terminal output below as example:

$ gallery-dl https://www.reddit.com/r/anthro/comments/f646i7/𝙾𝙿𝙴𝙽𝙸𝙽𝙢_π™°π™΄πš‚πšƒπ™·π™΄πšƒπ™·π™Έπ™²_π™·π™΄π™°π™³πš‚π™·π™Ύπšƒ_π™²π™Ύπ™Όπ™Όπ™Έπš‚πš‚π™Έπ™Ύπ™½πš‚_get_a_v_a/
./gallery-dl/reddit/anthro/f646i7 𝙾𝙿𝙴𝙽𝙸𝙽𝙢 π™°π™΄πš‚πšƒπ™·π™΄πšƒβ€¦ dhazeart@gmail.com! ONLY 10 SLOTS AVAILABLE !.jpg
[download][warning] OSError: [Errno 36] File name too long: "./gallery-dl/reddit/anthro/f646i7 𝙾𝙿𝙴𝙽𝙸𝙽𝙢 π™°π™΄πš‚πšƒπ™·π™΄πšƒπ™·π™Έπ™² π™·π™΄π™°π™³πš‚π™·π™Ύπšƒ π™²π™Ύπ™Όπ™Όπ™Έπš‚πš‚π™Έπ™Ύπ™½πš‚! ⚑️ get a V A P O R W A V E version of your character for only $25! if you'd like to grab a spot, PM me here, on telegram at @dhazeartt or email me at dhazeart@gmail.com! ONLY 10 SLOTS AVAILABLE !.jpg.part"
[download][error] Failed to download f646i7 𝙾𝙿𝙴𝙽𝙸𝙽𝙢 π™°π™΄πš‚πšƒπ™·π™΄πšƒπ™·π™Έπ™² π™·π™΄π™°π™³πš‚π™·π™Ύπšƒ π™²π™Ύπ™Όπ™Όπ™Έπš‚πš‚π™Έπ™Ύπ™½πš‚! ⚑️ get a V A P O R W A V E version of your character for only $25! if you'd like to grab a spot, PM me here, on telegram at @dhazeartt or email me at dhazeart@gmail.com! ONLY 10 SLOTS AVAILABLE !.jpg

Its file name is 247 characters long, which seems acceptable, but is 359 bytes long, which is deemed too long by the kernel (on BTRFS).

A quick workaround fix I did was through a PostProcessor:

class FixFileNamePostProcessor(gallery_dl.postprocessor.common.PostProcessor):
    def prepare(self, pathfmt: gallery_dl.util.PathFormat):
        """Updates file path"""
        pathfmt.clean_path = FixFileNameFormatterWrapper(pathfmt.clean_path)
        pathfmt.build_path()

That uses this wrapper:

class FixFileNameFormatterWrapper:
    """Wraps file name formatter for ensuring a valid file name length"""

    def __init__(self, formatter: gallery_dl.util.Formatter):
        self.formatter = formatter

    def __call__(self, *args, **kwargs) -> str:
        path = self.formatter(*args, **kwargs)
        parts = list(map(fix_filename_length, Path(path).parts))
        return str(Path(*parts))

That uses this function:

def fix_filename_length(filename: str) -> str:
    """Ensures a segment has a valid file name length"""
    if len(filename.encode()) > 240:
        extension = Path(filename).suffix
        extension_bytes_length = len(extension.encode())
        stem_bytes = Path(filename).stem.encode()
        fixed_stem_bytes = stem_bytes[:240-extension_bytes_length]
        fixed_stem = fixed_stem_bytes.decode(errors="ignore")
        return fixed_stem + extension
    return filename

It would be nice if MAX_PATH was also observed (4096 on Linux, 260 on Windows (up to Windows 10's 2016 update, but only if you changed an entry in registry)), but that's not an issue for me right now.

Randalix commented 6 months ago

I have the same issue. Would be great to see this implemented.

Randalix commented 6 months ago

I could make it work by adding this in the config (trimming the name):

            "filename" : {
            ""                  : "{title[:40]}_{subreddit}_{id}.{extension}"
            },
reyaz006 commented 3 weeks ago

This should be a feature. The following works for youtube-dl for example (though couldn't find it in docs):

{filename[:150]} works in gallery-dl so I can't imagine why something like {filename[:150B]} should not work.

There is this explanation in https://github.com/mikf/gallery-dl/issues/873#issuecomment-656366953

There is no good general solution for the "filename length problem", which is why I haven't really tried to implement something.

But we have the [:150] symbol limiter regardless, which is not a general solution. Looks like Linux is far from getting support for longer filenames. So for now software itself should take care of it.

Hrxn commented 3 weeks ago
Example Result
Slicing (Bytes) {title_ja[b3:18]} ロー・ワー
reyaz006 commented 3 weeks ago

Oh, thanks. Also found it here https://github.com/mikf/gallery-dl/discussions/4087#discussioncomment-5977221

It's probably safe to close this issue.