mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.76k stars 885 forks source link

[BUG] Kemono.party username is not path restricted when used in archive #3359

Open diamondsw opened 1 year ago

diamondsw commented 1 year ago

For instance, with the following config:

{
    "extractor":
    {
        "skip": true,
        "sleep": 0,
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0",

        "kemonoparty": {
            "base-directory": "~/Desktop/",
            "metadata": true,
            "directory": ["{category}", "{service} {user} {username}", "{id} {title[:60]}"],
            "archive":   ["{category}", "{service} {user} {username}", "archive.db"],
            "filename": "{id} {title[:60]}.{extension}"
        }
    }
}

The following will create two separate folders, thanks to the "/" included in the username variable: gallery-dl -c kemono-bug.json "https://kemono.party/fantia/user/2717/post/1259389"

afterdelight commented 1 year ago

wdym by that

afterdelight commented 1 year ago

maybe you can remove / from the username with some config settings

diamondsw commented 1 year ago

wdym by that

I mean that if a slash appears in a variable then it needs to be either filtered out or changed before it is used in a filesystem path, as otherwise things break.

gallerydl should be sanitizing its paths before using them; non-filesystem-safe characters should be filtered in some way before being used (slashes are one example, but there are plenty of others that will cause issues on Linux, Windows, with Samba, etc). IIRC other parts of gallerydl will properly sanitize some characters, but the implementation is inconsistent and incomplete (honestly, this applies to path handling and variable substitution in general).

Hrxn commented 1 year ago

Yeah, because of how the "archive" setting works...(not the same type as "filename", e.g., the configuration.rst mentions this) I mean, what are you actually trying to achieve here? A per-user archive file for everything on Kemono.party? What even is the point?

diamondsw commented 1 year ago

I'm not downloading "everything on Kemono.party" - I'm not sure where you got that from.

I keep the archive next to the files downloaded so everything is self-contained. I can move everything together, whether that's reorganizing files, moving to a different user or host, Dockerizing something and using bind mounts - whatever - and then all I have to do is point something at a different base and it all Just Works. Except because it doesn't process archive paths like any other path, It Doesn't Work.

To flip this around, why on earth would you not treat a path to an archive like every other path, and why would you not sanitize anything that hits the filesystem for filesystem-safe characters?

taskhawk commented 1 year ago

Does using extractor.*.path-metadata and accessing it in archive works?