Open 1223334444abc opened 10 months ago
And I heard that Pixiv has strict limitations on web crawlers. Will using this software be restricted, and do I need to configure anything extra?
Configuring gallery-dl is pretty easy, in my opinion.
You should start with your own config file (gallery-dl --config-create
), for guidance you can take a look at two example config files here (gallery-dl*.conf
), and all what you basically need for what you are asking here are the "filename"
and the "directory"
options.
I suggest you just start with that, and if you have further questions, just ask and someone will gladly help..
The only thing which might not be working (yet) is this gifdelay
info you use here, but if that is metadata provided by Pixiv itself it should be easy enough. Although I am not really sure, because I haven't used Pixiv in a long time.
And I heard that Pixiv has strict limitations on web crawlers. Will using this software be restricted, and do I need to configure anything extra?
Yeah, you definitely want to use your own account and set up proper gallery-dl authentication.
(i.e. gallery-dl oauth:pixiv
)
Configuring gallery-dl is pretty easy, in my opinion.
You should start with your own config file (
gallery-dl --config-create
), for guidance you can take a look at two example config files here (gallery-dl*.conf
), and all what you basically need for what you are asking here are the"filename"
and the"directory"
options.I suggest you just start with that, and if you have further questions, just ask and someone will gladly help..
The only thing which might not be working (yet) is this
gifdelay
info you use here, but if that is metadata provided by Pixiv itself it should be easy enough. Although I am not really sure, because I haven't used Pixiv in a long time.And I heard that Pixiv has strict limitations on web crawlers. Will using this software be restricted, and do I need to configure anything extra?
Yeah, you definitely want to use your own account and set up proper gallery-dl authentication. (i.e.
gallery-dl oauth:pixiv
)
“gifdelay”, “nogif”, and “gif” are custom masks provided by PUBD to adopt different naming strategies when the post is a dynamic image (illust.type=="ugoira"/illust.type!="ugoira"). How can I implement this in the configuration file?
Not sure, if you could post the output of a Pixiv ugoira example link (gallery-dl -K <Example>
) here, I might be able to tell you more..
These two posts are static images and dynamic images, want to use two different storage methods.
[pixiv][info] Refreshing access token
Keywords for directory names:
-----------------------------
caption
Very late Birthday gift for 茲塔~🔶
category
pixiv
comment_access_control
0
create_date
2023-10-24T00:00:56+09:00
date
2023-10-23 15:00:56
height
3508
id
112799086
illust_ai_type
1
illust_book_style
0
is_bookmarked
False
is_muted
False
num
0
page_count
1
rating
R-18
restrict
0
sanity_level
6
series
None
subcategory
work
suffix
tags[N]
0 R-18
1 賀圖
2 ケモショタ
title
🔸🔶🔸
total_bookmarks
95
total_comments
1
total_view
497
type
illust
user['account']
kevinliu5605
user['id']
1643271
user['is_followed']
True
user['name']
元元
user['profile_image_urls']['medium']
https://i.pximg.net/user-profile/img/2016/04/26/04/27/07/10851635_9f9ed93bd0114b7137a1e0c8058a682b_170.jpg
visible
True
width
2480
x_restrict
1
Keywords for filenames and --filter:
------------------------------------
caption
Very late Birthday gift for 茲塔~🔶
category
pixiv
comment_access_control
0
create_date
2023-10-24T00:00:56+09:00
date
2023-10-23 15:00:56
date_url
2023-10-23 15:00:56
extension
png
filename
112799086_p0
height
3508
id
112799086
illust_ai_type
1
illust_book_style
0
is_bookmarked
False
is_muted
False
num
0
page_count
1
rating
R-18
restrict
0
sanity_level
6
series
None
subcategory
work
suffix
tags[N]
0 R-18
1 賀圖
2 ケモショタ
title
🔸🔶🔸
total_bookmarks
95
total_comments
1
total_view
497
type
illust
user['account']
kevinliu5605
user['id']
1643271
user['is_followed']
True
user['name']
元元
user['profile_image_urls']['medium']
https://i.pximg.net/user-profile/img/2016/04/26/04/27/07/10851635_9f9ed93bd0114b7137a1e0c8058a682b_170.jpg
visible
True
width
2480
x_restrict
1
Keywords for directory names:
-----------------------------
caption
category
pixiv
comment_access_control
0
create_date
2023-10-22T20:26:12+09:00
date
2023-10-22 11:26:12
height
1677
id
112765475
illust_ai_type
1
illust_book_style
0
is_bookmarked
False
is_muted
False
num
0
page_count
1
rating
R-18
restrict
0
sanity_level
6
series
None
subcategory
work
suffix
tags[N]
0 R-18
1 うごイラ
2 オリジナル
3 ショタ
4 男の子
5 shota
title
サウナ
total_bookmarks
573
total_comments
7
total_view
2650
type
ugoira
user['account']
user_dhwn3743
user['id']
96843491
user['is_followed']
True
user['name']
こもれび
user['profile_image_urls']['medium']
https://i.pximg.net/user-profile/img/2023/07/29/01/13/06/24735137_cf326bff35a01dca4e00e4908e9c3cdc_170.jpg
visible
True
width
2175
x_restrict
1
Keywords for filenames and --filter:
------------------------------------
caption
category
pixiv
comment_access_control
0
create_date
2023-10-22T20:26:12+09:00
date
2023-10-22 11:26:12
date_url
2023-10-22 11:26:12
extension
zip
filename
112765475_ugoira1920x1080
frames[N]['delay']
150
frames[N]['file']
000000.jpg
height
1677
id
112765475
illust_ai_type
1
illust_book_style
0
is_bookmarked
False
is_muted
False
num
0
page_count
1
rating
R-18
restrict
0
sanity_level
6
series
None
subcategory
work
suffix
tags[N]
0 R-18
1 うごイラ
2 オリジナル
3 ショタ
4 男の子
5 shota
title
サウナ
total_bookmarks
573
total_comments
7
total_view
2650
type
ugoira
user['account']
user_dhwn3743
user['id']
96843491
user['is_followed']
True
user['name']
こもれび
user['profile_image_urls']['medium']
https://i.pximg.net/user-profile/img/2023/07/29/01/13/06/24735137_cf326bff35a01dca4e00e4908e9c3cdc_170.jpg
visible
True
width
2175
x_restrict
1
"filename": "(pid-{id}){title}_p{num}.{extension}",
"directory": ["Pixiv", "{user[name]}", "OX163"],
The static image seems to work with the settings mentioned above, but I haven’t figured out how to configure dynamic images.
\Pixiv\111AAA\OX163(pid-123456)Abcdefg\p0_delay500ms.jpg
"filename": "p{num}_delay{illust.ugoira_metadata.frames.delay}.{extension}",
"directory": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
gallery-dl doesn’t seem to retrieve frame information for dynamic images? My original software provided the following explanation:
{
"id": 49709638,
"title": "东娘厚郁稲 - 动态",
"type": "ugoira",
"image_urls": {
"square_medium": "https://i.pximg.net/c/360x360_70/img-master/img/2015/04/07/03/32/03/49709638_square1200.jpg",
"medium": "https://i.pximg.net/c/540x540_70/img-master/img/2015/04/07/03/32/03/49709638_master1200.jpg",
"large": "https://i.pximg.net/c/600x1200_90/img-master/img/2015/04/07/03/32/03/49709638_master1200.jpg"
},
"caption": "电波洗脑,视频地址是 <a href=\"http://www.bilibili.tv/video/av936752\" target=\"_blank\">http://www.bilibili.tv/video/av936752</a>",
"restrict": 0,
"user": {
"id": 3896348,
"name": "枫谷剑仙",
"account": "mapaler",
"profile_image_urls": {
"medium": "https://i2.pixiv.net/user-profile/img/2016/04/22/17/52/13/10835493_0604d937120e2b0f68dd87474d05fe71_170.png"
},
"is_followed": false
},
"tags": [
{"name": "うごイラ"},
{"name": "动漫东东"},
{"name": "东东娘"},
{"name": "鼠绘"}
],
"tools": ["Fireworks"],
"create_date": "2015-04-07T03:32:03+09:00",
"page_count": 1,
"width": 1024,
"height": 768,
"sanity_level": 2,
"meta_single_page": {
"original_image_url": "https://i3.pixiv.net/img-original/img/2015/04/07/03/32/03/49709638_ugoira0.png"
},
"meta_pages": [],
"filename": "49709638_ugoira",
"extention": "png",
"total_view": 174,
"total_bookmarks": 2,
"is_bookmarked": false,
"visible": true,
"is_muted": false,
"total_comments": 1,
"ugoira_metadata": { //动画增加的帧信息,如设置为不获取则没有ugoira_metadata
"zip_urls": {
"medium": "https://i3.pixiv.net/img-zip-ugoira/img/2015/04/07/03/32/03/49709638_ugoira600x600.zip"
},
"frames": [ //获取动图的帧数使用 illust.ugoira_metadata.frames.length
{
"file": "000000.jpg",
"delay": 60
},
{
"file": "000001.jpg",
"delay": 60
},
{
"file": "000002.jpg",
"delay": 60
},
{
"file": "000003.jpg",
"delay": 60
},
{
"file": "000004.jpg",
"delay": 60
},
{
"file": "000005.jpg",
"delay": 60
}
]
}
}
Ah, perfect. By dynamic image, you mean ugoira, right?
It seems that everything is already there:
frames[N]['delay']
150
So, what you want for your "filename"
setting is probably this:
"filename": {
"type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
"" : "(pid-{id}){title}_p{num}.{extension}"
}
This is a conditional filename setting.
The first line is the filename setting for type = ugoira
, and the second line is the "normal" filename setting (means everything else, basically).
Ah, perfect. By dynamic image, you mean ugoira, right?
It seems that everything is already there:
frames[N]['delay'] 150
So, what you want for your
"filename"
setting is probably this:"filename": { "type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}", "" : "(pid-{id}){title}_p{num}.{extension}" }
This is a conditional filename setting. The first line is the filename setting for
type = ugoira
, and the second line is the "normal" filename setting (means everything else, basically).
"pixiv":
{
"#": "override global archive path for pixiv",
"archive": "~/gallery-dl/archive-pixiv.sqlite3",
"#": "set custom directory and filename format strings for all pixiv downloads",
"filename":
{
"type == 'ugoira'": "p{num}_delay{frames[0]['delay']}ms.{extension}",
"" : "(pid-{id}){title}_p{num}.{extension}"
},
"directory":
{
"type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
"" : ["Pixiv", "{user[name]}", "OX163"]
},
"refresh-token": "...",
"#": "transform ugoira into lossless MKVs",
"ugoira": true,
"postprocessors": ["ugoira-copy"],
"#": "use special settings for favorites and bookmarks",
"favorite":
{
"directory": ["Pixiv", "Favorites", "{user[id]}"]
},
"bookmark":
{
"directory": ["Pixiv", "My Bookmarks"],
"refresh-token": "..."
}
},
Why did I get a zip file in my folder instead of a bunch of image frames? And [postprocessor][warning] module 'ugoira-copy' not found
Pixiv\AAAAA\OX163\(pid-11111)AAA\
p0_delay150ms.zip
not
Pixiv\AAAAA\OX163\(pid-11111)AAA\
p0_delay150ms.jpg
p1_delay150ms.jpg
p2_delay150ms.jpg
p3_delay150ms.jpg
p4_delay150ms.jpg
p5_delay150ms.jpg
Here with correct formatting etc:
{
"extractor":
{
"pixiv":
{
"archive": "~/gallery-dl/archive-pixiv.sqlite3",
"filename": {
"type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
"" : "(pid-{id}){title}_p{num}.{extension}"
},
"directory": {
"type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
"" : ["Pixiv", "{user[name]}", "OX163"]
},
"refresh-token": "...",
"ugoira": true,
"postprocessors": ["ugoira-copy"],
"favorite":
{
"directory": ["Pixiv", "Favorites", "{user[id]}"]
},
"bookmark":
{
"directory": ["Pixiv", "My Bookmarks"],
"refresh-token": "..."
}
}
}
}
Tip:
You can check (and fix) this quickly online with vscode.dev. Just make sure the language mode is JSON.
Or you use VSCode locally, if installed..
Here with correct formatting etc:
{ "extractor": { "pixiv": { "archive": "~/gallery-dl/archive-pixiv.sqlite3", "filename": { "type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}", "" : "(pid-{id}){title}_p{num}.{extension}" }, "directory": { "type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"], "" : ["Pixiv", "{user[name]}", "OX163"] }, "refresh-token": "...", "ugoira": true, "postprocessors": ["ugoira-copy"], "favorite": { "directory": ["Pixiv", "Favorites", "{user[id]}"] }, "bookmark": { "directory": ["Pixiv", "My Bookmarks"], "refresh-token": "..." } } } }
Tip:
You can check (and fix) this quickly online with vscode.dev. Just make sure the language mode is JSON.
Or you use VSCode locally, if installed..
Thank you very much for your answer! I just compared examples and fixed the issue. I modified the previous query, and now I’m primarily facing an issue with downloading image frames.
Why did I get a zip file in my folder instead of a bunch of image frames? And [postprocessor][warning] module 'ugoira-copy' not found
Pixiv\AAAAA\OX163\(pid-11111)AAA\
p0_delay150ms.zip
not
Pixiv\AAAAA\OX163\(pid-11111)AAA\
p0_delay150ms.jpg
p1_delay150ms.jpg
p2_delay150ms.jpg
p3_delay150ms.jpg
p4_delay150ms.jpg
p5_delay150ms.jpg
Because the config snippet you used tries to use a post-processor called ugoira-copy
, but this post-processor has not be defined yet in your config snippet.
I've added it all together here, with some basic global options at the beginning, which are pretty important, but I did not have them in my example above:
{
"extractor":
{
"base-directory": "~/your/path/here/downloads/gallery-dl",
"archive": "~/your/path/here/gallery-dl-stuff/gallery-dl.archive.global.db",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
"skip": true,
"pixiv":
{
"archive": "~/gallery-dl/archive-pixiv.sqlite3",
"filename": {
"type == 'ugoira'": "{id}_p{num}_delay{frames[0]['delay']}ms.{extension}",
"" : "(pid-{id}){title}_p{num}.{extension}"
},
"directory": {
"type == 'ugoira'": ["Pixiv", "{user[name]}", "OX163", "(pid-{id}){title}"],
"" : ["Pixiv", "{user[name]}", "OX163"]
},
"refresh-token": "...",
"ugoira": true,
"postprocessors": ["ugoira-copy"],
"favorite":
{
"directory": ["Pixiv", "Favorites", "{user[id]}"]
},
"bookmark":
{
"directory": ["Pixiv", "My Bookmarks"],
"refresh-token": "..."
}
}
},
"postprocessor":
{
"ugoira-webm":
{
"name": "ugoira",
"extension": "webm",
"ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"],
"ffmpeg-twopass": true,
"ffmpeg-demuxer": "image2"
},
"ugoira-mp4":
{
"name": "ugoira",
"extension": "mp4",
"ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-c:v", "libx264", "-an", "-b:v", "4M", "-preset", "veryslow"],
"ffmpeg-twopass": true,
"libx264-prevent-odd": true
},
"ugoira-gif":
{
"name": "ugoira",
"extension": "gif",
"ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-filter_complex", "[0:v] split [a][b];[a] palettegen [p];[b][p] paletteuse"]
},
"ugoira-copy":
{
"name": "ugoira",
"extension": "mkv",
"ffmpeg-args": ["-hide_banner", "-loglevel", "error", "-c", "copy"],
"libx264-prevent-odd": false,
"repeat-last-frame": false
}
}
}
The "ugoira-copy" option attempts to convert dynamic pictures into .mkv videos. I wonder if there is a save option to save it in its original form and rename it, like:
Pixiv\AAAAA\OX163\(pid-11111)AAA\
p0_delay150ms.jpg
p1_delay150ms.jpg
p2_delay150ms.jpg
p3_delay150ms.jpg
p4_delay10ms.jpg
p5_delay10ms.jpg
In the tools I used before, it was mentioned: "Pixiv's dynamic images do not return all the original image information, so when using PUBD to download, PUBD modify the frame number through the URL of the first original image to get the original image paths of the subsequent images, and at the same time, record the intervals between each image." I have looked at configuration.rst and gallery-dl-example.conf, but I still don't know how to download dynamic images in this format.
(I mainly hope that the newly downloaded file will be 100% consistent with the existing local database.)
Another small issue is that the tool I used before was able to convert characters that don't conform to file name rules into '_' characters.
str.replace(/[:\*\?"<>\|\r\n]/ig, "_")
str.replace(/[\/\\]/ig, "_")
I don't know how to implement this functionality in gallery-dl.
For that there's path-restrict, and from the looks of it you're looking to replace all characters that windows doesn't allow, which should be applied automatically if you're on windows, and if not you can just specify -o path-restrict=windows
or add it to your config (just add "path-restrict": "windows"
under the pixiv extractor).
Ugoira, or "dynamic pictures" as you call them, come in a .zip
archive when downloading them from Pixiv.
gallery-dl can convert the frames/pictures in such an archive to an animated format using an ugoira
post processor and FFmpeg, but just extracting and renaming these frames like PUBD does is not supported.
To only download the archive without touching it any further, simply do not use an ugoira
post processor while still leaving the ugoira
option enabled.
Ugoira, or "dynamic pictures" as you call them, come in a
.zip
archive when downloading them from Pixiv.gallery-dl can convert the frames/pictures in such an archive to an animated format using an
ugoira
post processor and FFmpeg, but just extracting and renaming these frames like PUBD does is not supported.To only download the archive without touching it any further, simply do not use an
ugoira
post processor while still leaving theugoira
option enabled.
I seem to remember that pixiv’s Ugoira has non-uniform image delays, and I wonder if there is a way to save it 100% losslessly, including the original images and their delays between each frames (output as text, or any other form is fine too)?
I’m planning to migrate from PUBD to gallery-dl, but I don’t know how to configure this software. How to achieve the same effect in gallery-dl? (Text output can also be placed in individual JSON files for each image.) I beg anyone to help me gratefully.
This is my current file naming configuration in PUBD:
File Saving Path and Custom Mask:
%{illust.user.name.replace(/[\\/\\\\]/ig, "_")}/OX163%{gif}/%{nogif}%{(illust.page_count>1)?"_p"+page:""}%{(illust.type=="ugoira")?"p"+page:""}%{gifdelay}.%{illust.extention}
gifdelay:
illust.type=="ugoira"
_Delay%{illust.ugoira_metadata.frames[page].delay}ms
nogif:
illust.type!="ugoira"
(pid-%{illust.id})%{illust.title.replace(/[\\/\\\\]/ig, "_").replace(/:/ig, ";")}
gif:
illust.type=="ugoira"
/(pid-%{illust.id})%{illust.title.replace(/:/ig, ";")}
Text Output Format:
↲[%{new Date(illust.create_date).getFullYear()}-%{new Date(illust.create_date).getMonth()+1}-%{new Date(illust.create_date).getDate()}](pid-%{illust.id})%{illust.title.replace(/[\\/\\\\]/ig, "_").replace(/:/ig, ";")}%{(illust.page_count>1||illust.type=="ugoira")?"_p"+page:""}%{gifdelay}.%{illust.extention}【%{illust.caption}】%{illust.tags.map(function(t)\{return t.name;\}).join(",")}
Current Naming Example: nogif:
Pixiv\111AAA\OX163\(pid-12345678)Abcdefg_p0.png
gif:\Pixiv\111AAA\OX163\(pid-123456)Abcdefg\p0_delay500ms.jpg
Text:\Pixiv\111AAA\20230701.txt