mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.36k stars 925 forks source link

[Pixiv]How to Configure Download Settings? (noob questions) #4704

Open 1223334444abc opened 10 months ago

1223334444abc commented 10 months ago

I’m planning to migrate from PUBD to gallery-dl, but I don’t know how to configure this software. How to achieve the same effect in gallery-dl? (Text output can also be placed in individual JSON files for each image.) I beg anyone to help me gratefully.

This is my current file naming configuration in PUBD:

File Saving Path and Custom Mask: %{illust.user.name.replace(/[\\/\\\\]/ig, "_")}/OX163%{gif}/%{nogif}%{(illust.page_count>1)?"_p"+page:""}%{(illust.type=="ugoira")?"p"+page:""}%{gifdelay}.%{illust.extention}

gifdelay: illust.type=="ugoira" _Delay%{illust.ugoira_metadata.frames[page].delay}ms

nogif: illust.type!="ugoira" (pid-%{illust.id})%{illust.title.replace(/[\\/\\\\]/ig, "_").replace(/:/ig, ";")}

gif: illust.type=="ugoira" /(pid-%{illust.id})%{illust.title.replace(/:/ig, ";")}

Text Output Format: ↲[%{new Date(illust.create_date).getFullYear()}-%{new Date(illust.create_date).getMonth()+1}-%{new Date(illust.create_date).getDate()}](pid-%{illust.id})%{illust.title.replace(/[\\/\\\\]/ig, "_").replace(/:/ig, ";")}%{(illust.page_count>1||illust.type=="ugoira")?"_p"+page:""}%{gifdelay}.%{illust.extention}【%{illust.caption}】%{illust.tags.map(function(t)\{return t.name;\}).join(",")}

Current Naming Example: nogif: Pixiv\111AAA\OX163\(pid-12345678)Abcdefg_p0.png gif: \Pixiv\111AAA\OX163\(pid-123456)Abcdefg\p0_delay500ms.jpg Text: \Pixiv\111AAA\20230701.txt

↲[2021-12-23](pid-1234566)JPG.PSD配布.jpg【DL期限は12/31までです🌳 <a href="https://11111.fanbox.cc" target="_blank">https://11111.fanbox.cc</a>】少年,男の子,評価不要
↲[2021-11-19](pid-1234565)JPG.PSD配布.jpg【DL期限は11/30までです👼 <a href="https://11111.fanbox.cc" target="_blank">https://11111.fanbox.cc</a>】男の子,少年,評価不要
↲[2021-10-10](pid-1234564)JPG.PSD配布.jpg【PSDでは隠れたキャラが見れます。<br />DL期限は10/31までです⚽ <a href="https://11111.fanbox.cc" target="_blank">https://11111.fanbox.cc</a>】男の子,少年,評価不要
1223334444abc commented 10 months ago

And I heard that Pixiv has strict limitations on web crawlers. Will using this software be restricted, and do I need to configure anything extra?

Hrxn commented 10 months ago

Configuring gallery-dl is pretty easy, in my opinion.

You should start with your own config file (gallery-dl --config-create), for guidance you can take a look at two example config files here (gallery-dl*.conf), and all what you basically need for what you are asking here are the "filename" and the "directory" options.

I suggest you just start with that, and if you have further questions, just ask and someone will gladly help..

The only thing which might not be working (yet) is this gifdelay info you use here, but if that is metadata provided by Pixiv itself it should be easy enough. Although I am not really sure, because I haven't used Pixiv in a long time.

And I heard that Pixiv has strict limitations on web crawlers. Will using this software be restricted, and do I need to configure anything extra?

Yeah, you definitely want to use your own account and set up proper gallery-dl authentication. (i.e. gallery-dl oauth:pixiv)

1223334444abc commented 10 months ago

Configuring gallery-dl is pretty easy, in my opinion.

You should start with your own config file (gallery-dl --config-create), for guidance you can take a look at two example config files here (gallery-dl*.conf), and all what you basically need for what you are asking here are the "filename" and the "directory" options.

I suggest you just start with that, and if you have further questions, just ask and someone will gladly help..

The only thing which might not be working (yet) is this gifdelay info you use here, but if that is metadata provided by Pixiv itself it should be easy enough. Although I am not really sure, because I haven't used Pixiv in a long time.

And I heard that Pixiv has strict limitations on web crawlers. Will using this software be restricted, and do I need to configure anything extra?

Yeah, you definitely want to use your own account and set up proper gallery-dl authentication. (i.e. gallery-dl oauth:pixiv)

“gifdelay”, “nogif”, and “gif” are custom masks provided by PUBD to adopt different naming strategies when the post is a dynamic image (illust.type=="ugoira"/illust.type!="ugoira"). How can I implement this in the configuration file?

Hrxn commented 10 months ago

Not sure, if you could post the output of a Pixiv ugoira example link (gallery-dl -K <Example>) here, I might be able to tell you more..

1223334444abc commented 10 months ago

These two posts are static images and dynamic images, want to use two different storage methods.

[pixiv][info] Refreshing access token
Keywords for directory names:
-----------------------------
caption
  Very late Birthday gift for 茲塔~🔶
category
  pixiv
comment_access_control
  0
create_date
  2023-10-24T00:00:56+09:00
date
  2023-10-23 15:00:56
height
  3508
id
  112799086
illust_ai_type
  1
illust_book_style
  0
is_bookmarked
  False
is_muted
  False
num
  0
page_count
  1
rating
  R-18
restrict
  0
sanity_level
  6
series
  None
subcategory
  work
suffix

tags[N]
  0 R-18
  1 賀圖
  2 ケモショタ
title
  🔸🔶🔸
total_bookmarks
  95
total_comments
  1
total_view
  497
type
  illust
user['account']
  kevinliu5605
user['id']
  1643271
user['is_followed']
  True
user['name']
  元元
user['profile_image_urls']['medium']
  https://i.pximg.net/user-profile/img/2016/04/26/04/27/07/10851635_9f9ed93bd0114b7137a1e0c8058a682b_170.jpg
visible
  True
width
  2480
x_restrict
  1

Keywords for filenames and --filter:
------------------------------------
caption
  Very late Birthday gift for 茲塔~🔶
category
  pixiv
comment_access_control
  0
create_date
  2023-10-24T00:00:56+09:00
date
  2023-10-23 15:00:56
date_url
  2023-10-23 15:00:56
extension
  png
filename
  112799086_p0
height
  3508
id
  112799086
illust_ai_type
  1
illust_book_style
  0
is_bookmarked
  False
is_muted
  False
num
  0
page_count
  1
rating
  R-18
restrict
  0
sanity_level
  6
series
  None
subcategory
  work
suffix

tags[N]
  0 R-18
  1 賀圖
  2 ケモショタ
title
  🔸🔶🔸
total_bookmarks
  95
total_comments
  1
total_view
  497
type
  illust
user['account']
  kevinliu5605
user['id']
  1643271
user['is_followed']
  True
user['name']
  元元
user['profile_image_urls']['medium']
  https://i.pximg.net/user-profile/img/2016/04/26/04/27/07/10851635_9f9ed93bd0114b7137a1e0c8058a682b_170.jpg
visible
  True
width
  2480
x_restrict
  1
Keywords for directory names:
-----------------------------
caption

category
  pixiv
comment_access_control
  0
create_date
  2023-10-22T20:26:12+09:00
date
  2023-10-22 11:26:12
height
  1677
id
  112765475
illust_ai_type
  1
illust_book_style
  0
is_bookmarked
  False
is_muted
  False
num
  0
page_count
  1
rating
  R-18
restrict
  0
sanity_level
  6
series
  None
subcategory
  work
suffix

tags[N]
  0 R-18
  1 うごイラ
  2 オリジナル
  3 ショタ
  4 男の子
  5 shota
title
  サウナ
total_bookmarks
  573
total_comments
  7
total_view
  2650
type
  ugoira
user['account']
  user_dhwn3743
user['id']
  96843491
user['is_followed']
  True
user['name']
  こもれび
user['profile_image_urls']['medium']
  https://i.pximg.net/user-profile/img/2023/07/29/01/13/06/24735137_cf326bff35a01dca4e00e4908e9c3cdc_170.jpg
visible
  True
width
  2175
x_restrict
  1

Keywords for filenames and --filter:
------------------------------------
caption

category
  pixiv
comment_access_control
  0
create_date
  2023-10-22T20:26:12+09:00
date
  2023-10-22 11:26:12
date_url
  2023-10-22 11:26:12
extension
  zip
filename
  112765475_ugoira1920x1080
frames[N]['delay']
  150
frames[N]['file']
  000000.jpg
height
  1677
id
  112765475
illust_ai_type
  1
illust_book_style
  0
is_bookmarked
  False
is_muted
  False
num
  0
page_count
  1
rating
  R-18
restrict
  0
sanity_level
  6
series
  None
subcategory
  work
suffix

tags[N]
  0 R-18
  1 うごイラ
  2 オリジナル
  3 ショタ
  4 男の子
  5 shota
title
  サウナ
total_bookmarks
  573
total_comments
  7
total_view
  2650
type
  ugoira
user['account']
  user_dhwn3743
user['id']
  96843491
user['is_followed']
  True
user['name']
  こもれび
user['profile_image_urls']['medium']
  https://i.pximg.net/user-profile/img/2023/07/29/01/13/06/24735137_cf326bff35a01dca4e00e4908e9c3cdc_170.jpg
visible
  True
width
  2175
x_restrict
  1
1223334444abc commented 10 months ago
            "filename": "(pid-{id}){title}_p{num}.{extension}",
            "directory": ["Pixiv", "{user[name]}", "OX163"],

The static image seems to work with the settings mentioned above, but I haven’t figured out how to configure dynamic images.

\Pixiv\111AAA\OX163(pid-123456)Abcdefg\p0_delay500ms.jpg

            "filename": "p{num}_delay{illust.ugoira_metadata.frames.delay}.{extension}",
            "directory": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],

gallery-dl doesn’t seem to retrieve frame information for dynamic images? My original software provided the following explanation:

{
    "id": 49709638,
    "title": "东娘厚郁稲 - 动态",
    "type": "ugoira",
    "image_urls": {
        "square_medium": "https://i.pximg.net/c/360x360_70/img-master/img/2015/04/07/03/32/03/49709638_square1200.jpg",
        "medium": "https://i.pximg.net/c/540x540_70/img-master/img/2015/04/07/03/32/03/49709638_master1200.jpg",
        "large": "https://i.pximg.net/c/600x1200_90/img-master/img/2015/04/07/03/32/03/49709638_master1200.jpg"
    },
    "caption": "电波洗脑,视频地址是 <a href=\"http://www.bilibili.tv/video/av936752\" target=\"_blank\">http://www.bilibili.tv/video/av936752</a>",
    "restrict": 0,
    "user": {
        "id": 3896348,
        "name": "枫谷剑仙",
        "account": "mapaler",
        "profile_image_urls": {
            "medium": "https://i2.pixiv.net/user-profile/img/2016/04/22/17/52/13/10835493_0604d937120e2b0f68dd87474d05fe71_170.png"
        },
        "is_followed": false
    },
    "tags": [
        {"name": "うごイラ"},
        {"name": "动漫东东"},
        {"name": "东东娘"},
        {"name": "鼠绘"}
    ],
    "tools": ["Fireworks"],
    "create_date": "2015-04-07T03:32:03+09:00",
    "page_count": 1,
    "width": 1024,
    "height": 768,
    "sanity_level": 2,
    "meta_single_page": {
        "original_image_url": "https://i3.pixiv.net/img-original/img/2015/04/07/03/32/03/49709638_ugoira0.png"
    },
    "meta_pages": [],
    "filename": "49709638_ugoira",
    "extention": "png",
    "total_view": 174,
    "total_bookmarks": 2,
    "is_bookmarked": false,
    "visible": true,
    "is_muted": false,
    "total_comments": 1,
    "ugoira_metadata": { //动画增加的帧信息,如设置为不获取则没有ugoira_metadata
        "zip_urls": {
            "medium": "https://i3.pixiv.net/img-zip-ugoira/img/2015/04/07/03/32/03/49709638_ugoira600x600.zip"
        },
        "frames": [ //获取动图的帧数使用 illust.ugoira_metadata.frames.length
            {
                "file": "000000.jpg",
                "delay": 60
            },
            {
                "file": "000001.jpg",
                "delay": 60
            },
            {
                "file": "000002.jpg",
                "delay": 60
            },
            {
                "file": "000003.jpg",
                "delay": 60
            },
            {
                "file": "000004.jpg",
                "delay": 60
            },
            {
                "file": "000005.jpg",
                "delay": 60
            }
        ]
    }
}
Hrxn commented 10 months ago

Ah, perfect. By dynamic image, you mean ugoira, right?

It seems that everything is already there:

frames[N]['delay']
  150

So, what you want for your "filename" setting is probably this:

"filename": {
    "type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
    ""                : "(pid-{id}){title}_p{num}.{extension}"
}

This is a conditional filename setting. The first line is the filename setting for type = ugoira, and the second line is the "normal" filename setting (means everything else, basically).

1223334444abc commented 10 months ago

Ah, perfect. By dynamic image, you mean ugoira, right?

It seems that everything is already there:

frames[N]['delay']
  150

So, what you want for your "filename" setting is probably this:

"filename": {
    "type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
    ""                : "(pid-{id}){title}_p{num}.{extension}"
}

This is a conditional filename setting. The first line is the filename setting for type = ugoira, and the second line is the "normal" filename setting (means everything else, basically).

        "pixiv":
        {
            "#": "override global archive path for pixiv",
            "archive": "~/gallery-dl/archive-pixiv.sqlite3",

            "#": "set custom directory and filename format strings for all pixiv downloads",
            "filename":
            {
                "type == 'ugoira'": "p{num}_delay{frames[0]['delay']}ms.{extension}",
                ""                : "(pid-{id}){title}_p{num}.{extension}"
            },
            "directory":
            {
                "type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
                ""                : ["Pixiv", "{user[name]}", "OX163"]
            },
            "refresh-token": "...",

            "#": "transform ugoira into lossless MKVs",
            "ugoira": true,
            "postprocessors": ["ugoira-copy"],

            "#": "use special settings for favorites and bookmarks",
            "favorite":
            {
                "directory": ["Pixiv", "Favorites", "{user[id]}"]
            },
            "bookmark":
            {
                "directory": ["Pixiv", "My Bookmarks"],
                "refresh-token": "..."
            }
        },

Why did I get a zip file in my folder instead of a bunch of image frames? And [postprocessor][warning] module 'ugoira-copy' not found

Pixiv\AAAAA\OX163\(pid-11111)AAA\
                                  p0_delay150ms.zip

not

Pixiv\AAAAA\OX163\(pid-11111)AAA\
                                  p0_delay150ms.jpg
                                  p1_delay150ms.jpg
                                  p2_delay150ms.jpg
                                  p3_delay150ms.jpg
                                  p4_delay150ms.jpg
                                  p5_delay150ms.jpg
Hrxn commented 10 months ago

Here with correct formatting etc:

{
    "extractor":
    {
        "pixiv":
        {

            "archive": "~/gallery-dl/archive-pixiv.sqlite3",

            "filename": {
                "type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
                ""                : "(pid-{id}){title}_p{num}.{extension}"
            },
            "directory": {
                "type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
                ""                : ["Pixiv", "{user[name]}", "OX163"]
            },
            "refresh-token": "...",

            "ugoira": true,
            "postprocessors": ["ugoira-copy"],

            "favorite":
            {
                "directory": ["Pixiv", "Favorites", "{user[id]}"]
            },
            "bookmark":
            {
                "directory": ["Pixiv", "My Bookmarks"],
                "refresh-token": "..."
            }
        }
    }
}

Tip:

You can check (and fix) this quickly online with vscode.dev. Just make sure the language mode is JSON.

Or you use VSCode locally, if installed..

1223334444abc commented 10 months ago

Here with correct formatting etc:

{
    "extractor":
    {
        "pixiv":
        {

            "archive": "~/gallery-dl/archive-pixiv.sqlite3",

            "filename": {
                "type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
                ""                : "(pid-{id}){title}_p{num}.{extension}"
            },
            "directory": {
                "type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
                ""                : ["Pixiv", "{user[name]}", "OX163"]
            },
            "refresh-token": "...",

            "ugoira": true,
            "postprocessors": ["ugoira-copy"],

            "favorite":
            {
                "directory": ["Pixiv", "Favorites", "{user[id]}"]
            },
            "bookmark":
            {
                "directory": ["Pixiv", "My Bookmarks"],
                "refresh-token": "..."
            }
        }
    }
}

Tip:

You can check (and fix) this quickly online with vscode.dev. Just make sure the language mode is JSON.

Or you use VSCode locally, if installed..

Thank you very much for your answer! I just compared examples and fixed the issue. I modified the previous query, and now I’m primarily facing an issue with downloading image frames.

Why did I get a zip file in my folder instead of a bunch of image frames? And [postprocessor][warning] module 'ugoira-copy' not found

Pixiv\AAAAA\OX163\(pid-11111)AAA\
                                  p0_delay150ms.zip

not

Pixiv\AAAAA\OX163\(pid-11111)AAA\
                                  p0_delay150ms.jpg
                                  p1_delay150ms.jpg
                                  p2_delay150ms.jpg
                                  p3_delay150ms.jpg
                                  p4_delay150ms.jpg
                                  p5_delay150ms.jpg
Hrxn commented 10 months ago

Because the config snippet you used tries to use a post-processor called ugoira-copy, but this post-processor has not be defined yet in your config snippet.

I've added it all together here, with some basic global options at the beginning, which are pretty important, but I did not have them in my example above:

{
    "extractor":
    {
        "base-directory": "~/your/path/here/downloads/gallery-dl",
        "archive": "~/your/path/here/gallery-dl-stuff/gallery-dl.archive.global.db",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
        "skip": true,

        "pixiv":
        {

            "archive": "~/gallery-dl/archive-pixiv.sqlite3",

            "filename": {
                "type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
                ""                : "(pid-{id}){title}_p{num}.{extension}"
            },
            "directory": {
                "type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
                ""                : ["Pixiv", "{user[name]}", "OX163"]
            },
            "refresh-token": "...",

            "ugoira": true,
            "postprocessors": ["ugoira-copy"],

            "favorite":
            {
                "directory": ["Pixiv", "Favorites", "{user[id]}"]
            },
            "bookmark":
            {
                "directory": ["Pixiv", "My Bookmarks"],
                "refresh-token": "..."
            }
        }
    },

    "postprocessor":
    {

        "ugoira-webm":
        {
            "name": "ugoira",
            "extension": "webm",
            "ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"],
            "ffmpeg-twopass": true,
            "ffmpeg-demuxer": "image2"
        },

        "ugoira-mp4":
        {
            "name": "ugoira",
            "extension": "mp4",
            "ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-c:v", "libx264", "-an", "-b:v", "4M", "-preset", "veryslow"],
            "ffmpeg-twopass": true,
            "libx264-prevent-odd": true
        },

        "ugoira-gif":
        {
            "name": "ugoira",
            "extension": "gif",
            "ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-filter_complex", "[0:v] split [a][b];[a] palettegen [p];[b][p] paletteuse"]
        },

        "ugoira-copy":
        {
            "name": "ugoira",
            "extension": "mkv",
            "ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-c", "copy"],
            "libx264-prevent-odd": false,
            "repeat-last-frame": false
        }
    }
}
1223334444abc commented 10 months ago

The "ugoira-copy" option attempts to convert dynamic pictures into .mkv videos. I wonder if there is a save option to save it in its original form and rename it, like:

Pixiv\AAAAA\OX163\(pid-11111)AAA\
                                  p0_delay150ms.jpg
                                  p1_delay150ms.jpg
                                  p2_delay150ms.jpg
                                  p3_delay150ms.jpg
                                  p4_delay10ms.jpg
                                  p5_delay10ms.jpg

In the tools I used before, it was mentioned: "Pixiv's dynamic images do not return all the original image information, so when using PUBD to download, PUBD modify the frame number through the URL of the first original image to get the original image paths of the subsequent images, and at the same time, record the intervals between each image." I have looked at configuration.rst and gallery-dl-example.conf, but I still don't know how to download dynamic images in this format.

(I mainly hope that the newly downloaded file will be 100% consistent with the existing local database.)

1223334444abc commented 10 months ago

Another small issue is that the tool I used before was able to convert characters that don't conform to file name rules into '_' characters.

str.replace(/[:\*\?"<>\|\r\n]/ig, "_") str.replace(/[\/\\]/ig, "_")

I don't know how to implement this functionality in gallery-dl.

kattjevfel commented 10 months ago

For that there's path-restrict, and from the looks of it you're looking to replace all characters that windows doesn't allow, which should be applied automatically if you're on windows, and if not you can just specify -o path-restrict=windows or add it to your config (just add "path-restrict": "windows" under the pixiv extractor).

mikf commented 10 months ago

Ugoira, or "dynamic pictures" as you call them, come in a .zip archive when downloading them from Pixiv.

gallery-dl can convert the frames/pictures in such an archive to an animated format using an ugoira post processor and FFmpeg, but just extracting and renaming these frames like PUBD does is not supported.

To only download the archive without touching it any further, simply do not use an ugoira post processor while still leaving the ugoira option enabled.

1223334444abc commented 10 months ago

Ugoira, or "dynamic pictures" as you call them, come in a .zip archive when downloading them from Pixiv.

gallery-dl can convert the frames/pictures in such an archive to an animated format using an ugoira post processor and FFmpeg, but just extracting and renaming these frames like PUBD does is not supported.

To only download the archive without touching it any further, simply do not use an ugoira post processor while still leaving the ugoira option enabled.

I seem to remember that pixiv’s Ugoira has non-uniform image delays, and I wonder if there is a way to save it 100% losslessly, including the original images and their delays between each frames (output as text, or any other form is fine too)?