mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.39k stars 930 forks source link

[instagram] downloading posts with co-authors #6208

Open docholllidae opened 2 days ago

docholllidae commented 2 days ago

sometimes when downloading a user's profile it will put a post or two into another folder due to that post being coauthored with another profile

when scraping a profile all posts are downloaded to zzInsta\downloads\{owner.id}.{username} however when scraping letrileylive's profile, when the extractor reaches this post https://www.instagram.com/letrileylive/reel/C_3dPwEPmzU/ (warning: semi-nsfw) it is downloaded to zzInsta\downloads\6200677336.officialplayboyplus folder instead of the folder for letrileylive (note the link does redirect in the browser to https://www.instagram.com/officialplayboyplus/reel/C_3dPwEPmzU/)

how can i make sure these collab/coauthor posts are saved to the directory of the profile being scraped?

for reference here is my config:

{
    "extractor": {
        "base-directory": "X:/My Drive/",
        "archive": "%appdata%/gallery-dl/archive.sqlite3",
        "path-restrict": "^A-Za-z0-9_.~!-",
        "#skip": "abort:3",
        "keywords-default": "",

        "instagram": {
            "archive": "X:/My Drive/zzInsta/archive.instagram.sqlite3",
            "cookies": "X:/My Drive/zzInsta/cookies.instagram.1.txt",
            "include": ["avatar","posts","reels","highlights","stories"],

            "#avatar": {
                "#directory": ["zzInsta","downloads","{owner_id}.{username}","media","avatar"],
                "#archive": "",
                "#filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{owner_id}.{username}~_~{filename}.{extension}"
            },

            "directory": ["zzInsta","downloads","{owner_id}.{username}","{subcategory}"],
            "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{post_id}-{post_shortcode}-{num}.{username}~_~{description[0:50]}.{extension}",

            "sleep": [11.7,17.4],
            "sleep-request": [11,17],

            "posts": {
                "#skip": "abort:5"
            },
            "reels": {
                "#skip": "abort:5"
            }
        }
    }
}
mikf commented 1 day ago

Use the {user[...]} values instead of {owner_id} etc. These always reference the user account from your input URLs instead of a potential co-author.

Hrxn commented 1 day ago

Shouldn't {username} be the same here?

Also

            "directory": ["zzInsta","downloads","{owner_id}.{username}","{subcategory}"],

doesn't result in instagram\<profilename> like you suggested? What are you actually doing?

docholllidae commented 6 hours ago

Shouldn't {username} be the same here?

Also

            "directory": ["zzInsta","downloads","{owner_id}.{username}","{subcategory}"],

doesn't result in instagram\<profilename> like you suggested? What are you actually doing?

you're right, i had a brain fart when writing up my post. it results in zzInsta\downloads\id.username which i want (zz is prepended just cause i want the scraped sites at the bottom of my directory listing, there's other IG specific files in there so downloads go in a subdirectory, and then I add the the owner_id to the start of the user's folder cause some people tend to be quite liberal with their name changes)

I edited the OP to make those corrections

Use the {user[...]} values instead of {owner_id} etc. These always reference the user account from your input URLs instead of a potential co-author.

I'm not sure what {user[...]} values you refer to? running with -j option on a post the only values i find with "user" in the name is username (https://www.instagram.com/p/C_3dPwEPmzU/ example)

after some digging I did find a sort of work around, I add this into the extractor's options "parent-directory": "true"

the only downside being the filename is still named with the co-author's username, but that's a very minor detail to me in this case