Closed wankio closed 6 years ago
1, host_id_rawfilename Can it change to host_id_tags ? Because i don't see option in config file and filename already limited to 255char ? ... 3, can it have filename format like below software ?
You can configure the output filename and directory with the extractor.filename and extractor.directory options. To change the filename format for sankaku to "host_id_rawfilename", you would put something like this in your config file:
{
"extractor": {
"sankaku": {
"filename": "{category}_{id}_{tags}.{extension}"
}
}
}
2, does it have link history to avoid dupplicate downloaded file ? like ripme
gallery-dl skips downloads for files that already exist and there is also the archive option (also available with the --download-archive
command-line switch)
oh thank, i will try that :)
update : in config.json "sankaku": { "username": null, "password": null, "wait-min": 2.5, "wait-max": 5.0, "filename": "{category}{id}{tags}.{extension}" }, it have Errno 22 Invalid argument
The config snippet you posted looks fine and should work.
Could you post the whole output when you run gallery-dl with the --verbose
option? It would be helpful to know where exactly this exception occurs.
I:\DOWNLOADS\Command tools>gallery-dl https://chan.sankakucomplex.com/?tags=chan_co --verbose
[gallery-dl][debug] Version 1.4.2
[gallery-dl][debug] Python 3.4.4 - Windows-10-10.0.17134
[gallery-dl][debug] requests 2.19.1 - urllib3 1.23
[gallery-dl][debug] Starting DownloadJob for 'https://chan.sankakucomplex.com/?tags=chan_co'
[sankaku][debug] Using SankakuTagExtractor for 'https://chan.sankakucomplex.com/?tags=chan_co'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): chan.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://chan.sankakucomplex.com:443 "GET /?tags=chan_co&page=1 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://chan.sankakucomplex.com:443 "GET /post/show/7024858 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): cs.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://cs.sankakucomplex.com:443 "GET /data/64/bf/64bf0aa8829e737468e9a0a229ad0166.jpg?e=1531388877&m=Y6qa7KMsjcFbb6NBDTI6pQ HTTP/1.1" 200 695172
.\gallery-dl\Chan.Sankaku\chan_co\7024858_2018-07-11 03... hair, white bikini, white gloves, white swimsuit, wink.jpg
[sankaku][error] Unable to download data: [Errno 22] Invalid argument: '\\\\?\\I:\\DOWNLOADS\\Command tools\\gallery-dl\\Chan.Sankaku\\chan_co\\7024858_2018-07-11 03_45_fate (series), fate_grand order, bb (fate), chan co, simple background, 1_1 aspect ratio, 1girl, asymmetrical hair, bangs, bikini, black choker, breasts, choker, clavicle, cleavage, ;d, eyebrows visible through hair, female, front-tie bikini, front-tie top, gloves, hair ornament, hair ribbon, hand on hip, hand up, large breasts, long hair, long sleeves, looking at viewer, megane, navel, one eye closed, open mouth, pointer, ponytail, purple eyes, purple hair, red ribbon, ribbon, rimless eyewear, side ponytail, side-tie bikini, smile, solo, star, swimsuit, tied hair, very long hair, white bikini, white gloves, white swimsuit, wink.jpg.part'
My new config
"filename": "{id}_{created_at}_{tags}.{extension}",
"directory":["Chan.Sankaku","{search_tags}"],
"archive": "./gallery-dl/archive-chan.sankaku.sqlite3"
OK, that filename is way too long (670 characters) and there is currently, as also noted in #92, no way to prevent that.
I guess too long filenames could just be cut short to fit into the 255 character limit, but a more configurable approach (like string slicing for format string replacement fields) would be nice as well. I'll think of something ...
And, by the way: Python, at least on Linux, recognizes long filenames: OSError: [Errno 36] File name too long
, so I wasn't quite sure how this error came to be. But on Windows you either get [Errno 2] No such file or directory
or [Errno 22] Invalid argument
.
that's what i'm thinking...filename too long because we can't limited how many tags can be add in filename...anyway , thank :)
and can it support these format too ?
- %provider% = provider Name
- %id% = Image ID
- %tags% = Image Tags
- %rating% = Image Rating
- %md5% = MD5 Hash
- %artist% = Artist Tag
- %copyright% = Copyright Tag
- %character% = Character Tag
- %circle% = Circle Tag, yande.re extension
- %faults% = Faults Tag, yande.re extension
- %originalFilename% = Original Filename
- %searchtag% = Search tag
All of these fields are already available, but under different names.
%provider%
-> {category}
%id%
-> {id}
%tags%
-> {tags}
(or {tag_string}
on danbooru)%rating%
-> {score}
%originalFilename%
-> {name}.{extension}
and so on. The exact names depend on the booru board in question, as gallery-dl is just using the API responses without much modification. Take a look at the output with -K
to get a complete list of replacement field names.
To enable {tags_artist}
, {tags_character}
and so on, you need to set extractor.*.tags
to true
.
so after you add option to prevent long filename, i just need add tags:true in extractor {sankaku..} to enable artist/character ?
Can gallery-dl use this search_tags ? : [tags]+date:<=yyyy.mm.dd because after 1000result downloaded you can't download anymore...so you need add +date:<=yyyy.mm.dd after tag to have download more 1000result. yyyy.mm.dd is created_at i think so
to compare with danbooru downloader and other, i think gallery-dl is better
1 - low memory usage (i think because it only use one thread instead multi-thread)
2 - archive (skipped downloaded id) (ripme have it but danbooru downloader and other dont)
3 - bunch download from pastebin (ripme have rip from clipbloard, danbooro downloader dont have)
so after you add option to prevent long filename, i just need add tags:true in extractor {sankaku..} to enable artist/character ?
Yes, but it would be easier to enable this option for all boorus by just setting extractor.tags
to true
. Otherwise you would have to enable it for each site individually, i.e. extractor.sankaku.tags
, extractor.gelbooru.tags
, and so on.
Concerning filename lengths: you can now (since 8fe9056b16cbbb14b7e94fa92a8c8369cee654a9) slice values in format strings.
{tags[:200]}
would limit it to 200 characters max - everything after that will be cut off.
Can gallery-dl use this search_tags ? : [tags]+date:<=yyyy.mm.dd because after 1000result downloaded you can't download anymore...so you need add +date:<=yyyy.mm.dd after tag to have download more 1000result. yyyy.mm.dd is created_at i think so
It can, but that's not necessary if you want to go past 1000 results / page 50. You don't even need to provide username and password if you want to go past page 25. Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-
, for example)
[danbooru][error] An unexpected error occurred: AttributeError - 'list' object has no attribute 'startswith'.
Edit: this is my first post here. am i doing it right?
You should open a new issue, post the URL in question and, if possible, the complete error output with --verbose
.
ok i will test it soon :)
Yes, but it would be easier to enable this option for all boorus by just setting extractor.tags to true. Otherwise you would have to enable it for each site individually, i.e. extractor.sankaku.tags, extractor.gelbooru.tags, and so on.
Concerning filename lengths: you can now (since 8fe9056) slice values in format strings. {tags[:200]} would limit it to 200 characters max - everything after that will be cut off.
"sankaku":
{
"username": null,
"password": null,
"wait-min": 2.5,
"wait-max": 5.0,
"filename": "{tags_artist}_{tags[:200]}_{id}_{created_at}_.{extension}",
"directory":["Chan.Sankaku","{search_tags}"],
"tags": true
},
[sankaku][error] Applying filename format string failed: TypeError: string indices must be integers
When i'm even not set {tags} ..gallery-dl still only set filename as {id}_{createdat}.{extension} instead {tagsartist}{id}_{createdat}.{extension}
It can, but that's not necessary if you want to go past 1000 results / page 50. You don't even need to provide username and password if you want to go past page 25. Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)
so if i input tags have higher than 1000result, it will keep downloading until have nothing to download ?
[sankaku][error] Applying filename format string failed: TypeError: string indices must be integers
When i'm even not set {tags} ..gallery-dl still only set filename as {id}{created_at}.{extension} instead {tags_artist}{id}{createdat}.{extension}
You are using version 1.4.2 and not the latest git snapshot. The {tags[:200]}
thing and the tags
option for sankaku hasn't been "officially" released yet. Do a pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.zip
and try again.
so if i input tags have higher than 1000result, it will keep downloading until have nothing to download ?
Yes, it only stops after downloading all search results, but you can set a custom upper limit with, again, the --range
option.
oh nice ty, installed python version and it worked :)
i just test host local file and using r:link to batch download, wow it work too :)
host local file and using r:link to batch download
-i, --input-file FILE Download URLs found in FILE
And to quote myself from the other issue:
You can now use the L
format specifier to set a replacement if the format field value is too long. For example {tags:L100/too many tags/}
(https://github.com/mikf/gallery-dl/commit/e0dd8dff5f626a42678a916780b31f0193aef7ca).
thank, so i need update gallery-dl again ?
Only if you want to use the L
format specifier feature.
oh today it stop working in 3 hours....no error, just stop download. Command Window still processing but it dont download any new link in 3hours (checked website, still no error)
and with archive option in sankaku extractor, why i feel it so slow to check downloaded link. Wait Min/Max 2/5 but sometime it wait 8-10 or maybe 20+ seconds to just check files
oh today it stop working in 3 hours....no error, just stop download.
Hmm, there is a slim possibility that a HTTP requests "gets stuck" and the client waits forever for a reply from the remote server. Some HTTP requests send by gallery-dl - for some reason - don't have a timeout, so it probably happened with one of those. Fixing this should be easy. In the meantime: Ctrl+c
and try again.
why i feel it so slow to check downloaded link
Because it has to get download URL and metadata before it can check if a file has already been downloaded (same as youtube-dl). It doesn't help that Sankaku is incredibly slow itself, so you have to wait 2-5 seconds before each HTTP request (to avoid 429 Too Many Requests
errors) and then you have to wait for the request itself to finish, which might take another 5 seconds.
When using sakaku stuff, you should really use the --range
command-line option when necessary, as it allows the extractor to quickly jump ahead. gallery-dl --range 250- URL...
is going to immediately jump to image nr. 250 and start from there.
yeah...it's easy to fix with --range you told me
Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)
5tags at once, you mean 5 tags combined : ?tags=dynasty_warriors brown_hair china_dress female shoes right ?
Because it has to get download URL and metadata before it can check if a file has already been downloaded (same as youtube-dl). It doesn't help that Sankaku is incredibly slow itself, so you have to wait 2-5 seconds before each HTTP request (to avoid 429 Too Many Requests errors) and then you have to wait for the request itself to finish, which might take another 5 seconds.
sometime it wait 15-20seconds is normal ?
When using sakaku stuff, you should really use the --range command-line option when necessary, as it allows the extractor to quickly jump ahead. gallery-dl --range 250- URL... is going to immediately jump to image nr. 250 and start from there.
so i need to count downloaded files and compare with tags(totalresult) to know exactly range i need to put in right ?
It should have feature to skipped tags once it reach downloaded files (so it just only download newer pictures and stopped once it reach downloaded files if extractor archive option enabled)
yeah...it's easy to fix with --range you told me
That is not what I meant. I wanted to say "It's easy for me to add a timeout to regular HTTP requests, so it doesn't get stuck anymore" -> https://github.com/mikf/gallery-dl/commit/68d6033a5d260dd3ea8823edede5d16d50e45aae
5tags at once, you mean 5 tags combined : ?tags=dynasty_warriors brown_hair china_dress female shoes right ?
Right.
sometime it wait 15-20seconds is normal ?
Not really, no. I might be the case that the wait-min/-max
default values are too low and you get 429 Too Many Requests
responses from sankaku. In that case gallery-dl retries the original request after waiting for a bit, but it can take quite a bit of time until sankaku sends a normal response.
You can enable verbose output (-v
) to see what goes on behind the scenes. If you encounter anything 429 related, increase wait-min/-max
until this doesn't happen anymore.
so i need to count downloaded files and compare with tags(totalresult) to know exactly range i need to put in right ?
Your computer can count them for you: ... and you don't need the exact range, the start index is enough.
--range 200-300
will download anything from 200 to 300, but you can omit the end index (--range 200-
) to download from 200 to the end or the start index to download up to 300 (--range -300
).
It should have feature to skipped tags once it reach downloaded files (so it just only download newer pictures and stopped once it reach downloaded files if extractor archive option enabled)
--abort-on-skip Abort extractor run if a file download would
normally be skipped, i.e. if a file with the same
filename already exists
or the extractor.skip option
thank you
Not going to happen.
You can download the original and then down-sample it yourself, or ignore it with --filter
.
You should also open a new issue if you want to suggest a new feature. This one here is closed for a reason.
ok thank
1, host_id_rawfilename Can it change to host_id_tags ? Because i don't see option in config file and filename already limited to 255char ?
2, does it have link history to avoid dupplicate downloaded file ? like ripme
3, can it have filename format like below software ? https://github.com/Nandaka/DanbooruDownloader
thank 👍