Closed mikf closed 4 months ago
Edit: Found the answer in the code. ✓ https://github.com/mikf/gallery-dl/blob/bd5d08abbcd5729c93a85d2189bef1561959b3b4/gallery_dl/exception.py#L51-L72
{
"extractor": {
"twitter": {
"cookies": {
"auth_token": "{_env[galleryDlTwitterAuthToken]}",
"ct0": "{_env[galleryDlTwitterCt0]}"
}
}
}
}
--option
to inject more complex structures into the loaded configuration, like this whole block?{
"extractor": {
"twitter": {
"directory": {
"'reply_to' in locals()": [
"{category}",
"{user[name]!l}",
"replies",
"{reply_to|'-'!l}",
"{tweet_id}"
],
"(some dynamic query)": [
"{category}",
"{user[name]!l}",
"(some dynamic value)",
"{tweet_id}"
],
"": [
"{category}",
"{user[name]!l}",
"tweets",
"{tweet_id}"
]
}
}
}
}
Assuming the values are too dynamic to use gallery-dl’s expression language. Also assuming I can’t store a temporary, dynamically generated config.json.
@Hrxn
path-replace
gets only applied when path-restrict
is just a simple string and not an object.
path-replace
was implemented after all the other path-*
options in response to #755, so it might feel a bit off.
What happens if you set these options at the base level, and then use "path-replace" again at any "deeper"/more specific category level? Does it overwrite the replacement char then? Or if you use "path-restrict" again, can you update/overwrite specific replacement association options this way?
The same as with all other options, i.e. the general setting gets completely overwritten by the more specific one. It is not possible to update some replacements, you'll have to copy everything you want to keep.
PS: Maybe pinning the latest "Question, Feedback and Suggestions" thread at the top of the issues would be beneficial? What do you thing? Or maybe too much of a distraction?
That's a really good idea.
I wonder why this hadn't been done to the other threads before. Maybe pinning issues was not implemented back then, or maybe it just didn't occur to me for some reason.
@Jaid
Can I load cookies from environment variables that I reference in my config.json? Tried it, and either I did something wrong or this way is not supported.
That's not supported via config, but you could use -o "cookies.auth_token=${galleryDlTwitterAuthToken}"
on command-line.
How do I use --option to inject more complex structures into the loaded configuration, like this whole block?
The VALUE
part for --option NAME=VALUE
can be any complex data structure as long as it's JSON parsable.
Try something like --option 'extractor.twitter=<all twitter options as JSON>'
How do I make gallery-dl only output warnings and errors? I tried this from the example config
"output": {
"log": {
"level": "warning"
}
}
But it doesn't work, It still prints the name of every file it downloads and fills my terminal with too much info. I run a script that scrapes a lot of accounts and I have to scroll tens of thousands of lines. I just want it to print something in my terminal if something goes wrong...
@brsk93 Set output.mode
to "null"
.
"output": {
"mode": "null",
"log": {
"level": "warning"
}
}
Is it possible (I mean from a technical point of view) for you to modify gallery-dl to trigger parent postprocessor if parent extractor invokes sub-extractors? If so, should I create new issue for this feature request?
Also, there is no way to filter fields that dump into json except to manually write json-like format into metadata.content-format or remove all unnecessary fields, is it?
@dajotim937
It would theoretically be possible to trigger a parent post processor when spawning a child; the problem is that the necessary data structures aren't necessarily initialized at that point in time (or ever). You can create an issue if you want and I might be ably to hack something together, but this is something more for v2.0.
To filter metadata fields, use a metadata post processor in delete
mode before running another, normal one that writes data:
"postprocessor": [
{
"name": "metadata",
"mode": "delete",
"fields": ["user", "create_date", "foobar"]
},
{
"name": "metadata",
"mode": "jsonl",
"filename": "-"
}
]
To filter metadata fields, use a metadata post processor in
delete
mode before running another, normal one that writes data:
Well, yeah I remember that. It just.. Reddit has so many useless(in my case) json fields, so it would be easier just write "I need these 10 fields, other 30 you can delete" instead to manually write each of 30 field to remove.
You can create an issue if you want and I might be ably to hack something together, but this is something more for v2.0.
Okay. In my workflow I managed to trigger one postprocessor only if reddit metadata exists, but I will create an issue just in case you decide to implement it.
Is there a way to blacklist galleries that have more than 100 pages on exhentai etc?
@AKL55 --filter 'int(filecount) <= 100 or abort()'
works on exh specifically, but it still has to fetch the first image.
You can also use &advsearch=1&f_spt=100
when searching to have exh itself filter out large galleries.
Also i tried --filter "lang in ('eng')","int(filecount) <= 100 or abort()"
but only the page range works
Conditional statements can be combined with and
and or
:
--filter "lang == 'en' and int(filecount) <= 100 or abort()"
But you should really use the site-internal search. It's much more efficient.
https://exhentai.org/?f_search=language%3Aenglish%24&advsearch=1&f_spt=100
Hi, is it possible to set different download rates for different sites? If so can someone show an example config file, thank you.
@omfgntsa downloader.*.rate
Instead *
put any extractor what you need.
Example of config file. It has only special setting for downloader.ytdl.module but you can figure out how to set rates.
Hi I'm sorry I cannot find this line "downloader.ytdl.module" in the example config file you linked. I tried adding downloader..rate and replaced with the site name of the site I want to limit but I can't get it to work.
Are there any plans to provide a build for Intel/AMD - 64 Bit?
@omfgntsa I already answered your question in #3865, but no, you cannot set different download rates for different sites at the moment.
@account00001 "Nightly" builds provide executables for "64 Bit" (x86_64). The Windows .exe
files on the releases page will most likely always be 32 Bit (x86) only.
I now hav been wondering for a long time already:
How can we specify to use a certain extractor/plugin? I am asking for this case:
a Site uses a Chan Board (like 4chan, 8ch, etc) but gallery-dl doesn’t support the domain/url natively. if I remember correctly gallery-dl does support these kind of image boards?
@github-userx https://github.com/mikf/gallery-dl#examples
Thanks! I just noticed it as well after posting my (dumb) question :D
I tried 4chan and 8ch extractor for an Image Board but it did not work / wasn’t compatible it seems. The image board looks like the usual Chan boards.
Message ID: @.***>You’re the best, Mike! I tried it and it turned out to be a lynxchan board! Vichan it wasn’t and didn’t work.
This is great! I’ve been impressed by gallery-dl and all your efforts for many years! I always recommend gallery-dl to everyone and always mention your name to people when I talk about OpenSource projects and super responsive & supportive developers, Stay awesome! ;)
Message ID: @.***>I was wondering, can we combine the Remote source ("r:https://domain.com/ https://domain.com/……") feature with the -g parameter ? And how would we handle it if we have to specify the extractor:
gallery-dl -g "lynxchan:https://imageboard-example.com/board/catalog.html"
In this one we can’t use the r: option to extract URLs, correct?
Having this in my config for Reddit:
"filename": "{filename}.{extension}",
"videos": "ytdl"
Why am I getting the filename _DASH720.mp4 instead of iiric8uvdtx91.mp4 for this video:
https://www.reddit.com/r/selfie/comments/ylh6bf/
But the correct one, p6gu760aftx91.mp4, for this other one?
@github-userx
You can combine -g
and r:
by doing gallery-dl -g r:<URL>
. You don't need to specify an extractor in this case, since all r:
does is load the page from the specified URL and return everything that looks like a link, i.e. it starts with https://
or http://
.
A better way to get all download URLs from an image board catalog would be using -G
or -gg
to go one level deeper and resolve all returned threads.
This is great! I’ve been impressed by gallery-dl and all your efforts for many years! I always recommend gallery-dl to everyone and always mention your name to people when I talk about OpenSource projects and super responsive & supportive developers, Stay awesome! ;)
Thanks!
@taskhawk
I'd guess this is because the format selected by ytdl is different for these two posts, resulting in a different download URL and {filename}
.
Thank you, that clued me in and started reading more.
I installed yt-dlp
as a PIP module and checked the URLs with yt-dlp --dump-json
to see what keys it offered and added this to my config:
"downloader": {
"ytdl": {
"outtmpl": "%(id)s.%(ext)s"
}
}
Now it works as I wanted it to.
Seems a few backticks got messed up in the description of extractor.redgifs.format in the config doc page.
So I misused --write-info-json
instead of --write-metadata
and I want to go back and correct this by only getting metadata. I am not sure what combination of -skip=
, my archive file (I was thinking of editing it via SQL), and command I should run to minimize my footprint (Pixiv FYI). I also have a copy of the logs/console output to, worse case, put together a list of links. >_<
@9696neko, check issue #220.
@9696neko in particular, you should use --no-download
and disable skip and archive with --no-skip --download-archive ""
. You can also add some delay between API requests with --sleep-request 1-2
.
@taskhawk, @mikf: Thanks very much. Sorry, I forgot that issues search exists ^_^
Does gallery-dl have an output Filename template that can include metadata like upload_date and username? Similar to yt-dlp:
-o '%(uploader)s_%(upload_date)s_%(title)s_%(resolution)s_[ID=%(id)s].%(ext)s'
@github-userx Use -K
key to check which metadata you can use for filename.
gallery-dl.exe -K *link*
Documentation: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorfilename
Message ID: @.***>thanks @dajotim937 !
Is it possible to add support for tilde expansion in exec.command when using a list?
Currently it has to be like this:
"command": ["/home/user/script.sh", "{_filename}"]
But I wish this could work:
"command": ["~/script.sh", "{_filename}"]
Thanks!
Anyone know the commands to use Gallery-dl to download private albums that you have the guest links?
Which site? If having the link is the only requisite to access the content maybe you don't need anything else? Are you getting some error?
Yeah link not supported. You can grab the Flickr guest pass links on this Mangaka 5 dollars Patreon https://www.patreon.com/motokamurakami
Don't know the structure of guest pass links but Flickr is a supported site so try to force the extractor and see if it works:
If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor:
gallery-dl "tumblr:https://sometumblrblog.example"
unfortunately, it says the same thing, unsupported url.
Would it be possible for the flickr extractor to fetch aperture, shutter speed, focal length, iso, camera model, lens, perhaps the remaining exif data to save in metadata
Is it possible to scrape the text from a Patreon post and write it to a file?
Yes, check #4107.
Would it be possible to have a summary at the end of a gallery-dl run? I.e. list the totals of what types of files were downloaded. This would help determine whether the last run was correct. I currently use a combination of steps and it is quite tedious removing duplicate counts.
Sounds good.. Although it's maybe not immediately obvious what constitutes a good summary, or what would be an appropriate definition of "correct" here.
You should definitely make use of the log file, though..
Sounds good.. Although it's maybe not immediately obvious what constitutes a good summary, or what would be an appropriate definition of "correct" here.
You should definitely make use of the log file, though..
The log file is not very informative either at even debug level. I also cannot confirm renamed files. Maybe it's just me. I would like to see something like:
Total downloaded: 100
Total failed/missing: 2
PNG's: 22
WEBM's: 4
JPG's: 50
GIF's: 24
Has anyone recently had issues with Instagram cookies expiring more frequently?
I don’t have first hand experience with it myself but I’ve heard it’s gotten even more strict and difficult scraping Instagram.
Couple years ago we could scrape dozens of feeds/accounts without issue..good old times!
Continuation of the old issue as a central place for any sort of question or suggestion not deserving their own separate issue. There is also https://gitter.im/gallery-dl/main if that seems more appropriate.
Links to older issues: #11, #74