mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.87k stars 976 forks source link

Organizing Instagram tagged posts #3107

Open Fukitsu opened 2 years ago

Fukitsu commented 2 years ago

I'm trying to make gallery-dl download a tagged post inside the folder of the tagged user when scraping a profile. For example, when scraping this profile https://www.instagram.com/akunohako/ there's this post https://www.instagram.com/p/Cjs36PIjFWP/ which was made by a different user and I want to download it inside the akunohako folder instead of the xenon_ne folder, same for the filename as if the post were created by akunohako. I've tried with

"tagged": 
            {
                "directory": ["{tagged_username}"]
            },
"tagged": 
            {
                "directory": ["{tagged_users[][username]}"]
            },

and

 "postprocessors":
            [
                {
                    "name": "metadata",
                    "tagged":
                    {
                        "directory": ["{tagged_username}"]
                    }
                }
            ]

but can't get it to work. What's the proper way?

afterdelight commented 2 years ago

this is my tagged config:

"tagged": {
        "directory": ["{tagged_username} [{tagged_owner_id}]", "Tagged"],
        "filename": {
        "": "{date:%Y%m%d}_{username}_{post_shortcode}.{extension}",
        "count > 1": "{date:%Y%m%d}_{username}_{post_shortcode}_{num}.{extension}"
    }
}
mikf commented 2 years ago

{tagged_username}, {tagged_owner_id}, {tagged_full_name} and the "tagged" gdl subcategory are only applicable when downloading from instagram.com/USER/tagged URLs. Just plain instagram.com/USER uses the "posts" subcategory by default.

In your case you'd have to use {tagged_users[0][username]}, i.e. the first item from the list of @-ed users in a post, but that does not always work since there can be multiple tagged users and the one you want not being the first.

What you'd really need is a new metadata field for "coauthors" which gallery-dl does not yet extract, although that would also be a list, or something similar to the user and author distinction that Twitter has.

      "coauthor_producers": [
        {
          "pk": "212047764",
          "username": "akunohako",
          "full_name": "Aku",
          "is_private": false,
          "profile_pic_url": "https://instagram.ftxl3-2.fna.fbcdn.net/v/t51.2885-19/274583097_482362060206481_1212815530004034262_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.ftxl3-2.fna.fbcdn.net&_nc_cat=101&_nc_ohc=U4iWwyTFtiAAX94e6fb&edm=ALQROFkBAAAA&ccb=7-5&oh=00_AT-PpOkf42f6R3iP_gbX_8eWDN5pDE5qrtVz169XqWOMFg&oe=635E1CCD&_nc_sid=30a2ef",
          "profile_pic_id": "2780925354380274334_212047764",
          "is_verified": false
        }
      ],

(unfiltered API data from /p/Cjs36PIjFWP/)

Fukitsu commented 2 years ago

Some workaround I've been using is using a wrapper script inside my Instagram folder with -o base-directory=. and the filenames like this:

"filename":
            {
                "tagged_users and count > 1": "USER {date:%Y-%m-%d - %H_%M_%S} {post_shortcode}_{num}.{extension}",
                "tagged_users": "USER {date:%Y-%m-%d - %H_%M_%S} {post_shortcode}.{extension}",
                "count > 1": "{username} {date:%Y-%m-%d - %H_%M_%S} {post_shortcode}_{num}.{extension}",
                "": "{username} {date:%Y-%m-%d - %H_%M_%S} {post_shortcode}.{extension}"
            },

directory like this:

"directory":
            {
                "": ["."]
            },

and then rename the files replacing USER with the username and deleting them if there are duplicates. Does {coauthors[0][username]} always return the username I'm scraping or is it like {tagged_users[0][username]}?

mikf commented 2 years ago

{coauthors[0][username]} has probably the same problem as {tagged_users[0][username]}, given that it is also a list and therefore can have multiple values.

As I said in https://github.com/mikf/gallery-dl/issues/3107#issuecomment-1291723260, I'll eventually redo the user(name) fields for Instagram and handle them like it is currently done for Twitter, i.e. user[…] is the user the input URL points to and author[…](or some other name) is the actual post author/creator.

duoside commented 1 year ago

Yes, this feature would make scraping much easier. Currently, I'm manually moving the tagged files to the correct folder

Fovty commented 10 months ago

In case anyone else stumbles across the same problem. I solved it as follows (using akunohako as an example):

"directory": {
                "": ["{subcategory}"]
}

gallery-dl https://www.instagram.com/akunohako/ -d "./downloads/akunohako"

Since I run gallery-dl as a subprocess anyway, I can set the destination parameter dynamically (matching the profile name)