mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.51k stars 943 forks source link

Questions, Feedback and Suggestions #3 #146

Closed mikf closed 7 months ago

mikf commented 5 years ago

Continuation of the old issue as a central place for any sort of question or suggestion not deserving their own separate issue. There is also https://gitter.im/gallery-dl/main if that seems more appropriate.

Links to older issues: #11, #74

github-userx commented 1 year ago

@mikf will gallery-dl work with Reddit as long as they don’t remove old.reddit.com or Will gallery-dl also stop working once they kick off everyone from their API? I’ve been using it to backup media of entire subreddits.Β 

kattjevfel commented 1 year ago

@github-userx gallery-dl uses the reddit API.

Twi-Hard commented 1 year ago

The api is actually getting less limited, not going away (except NSFW stuff) https://mods.reddithelp.com/hc/en-us/articles/16693988535309

Twi-Hard commented 1 year ago

@mikf I don't completely understand this.. does this part affect gallery-dl? I can't tell if it means we all share the same limit or not. https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki Screenshot_20230617_083403_Brave

mikf commented 1 year ago

does this part affect gallery-dl?

I'd think so. Rate limits applying only per client id probably means that every user will have to register his own application and set a custom client-id and user-agent for gallery-dl, similar to Deviantart.

anonymous721 commented 1 year ago

Is there a way to tell string handling to truncate everything after a certain character? For example, many Pixiv artists advertise by adding "@[event]" to their username, and I'd want to drop everything after the "@". I know that ":R@//" can get rid of just the "@" symbol but it doesn't affect anything after, and it's not as simple as a regex "@.*" to capture everything after it.

mikf commented 1 year ago

@anonymous721 Use a special-type format string and use partition()

"\fF {user['name'].partition('@')[0]}"

or use re.sub()

"\fF {re.sub(r'@.+$', '', user['name'])}"
mikf commented 1 year ago

@lx30011 https://github.com/mikf/gallery-dl/commit/ccbc1a1d55135d598f32990dd98e58a14a77023f

edit: nvm, I meant https://github.com/mikf/gallery-dl/commit/c45a913bfd386754922549f63f0cae548199ab62

brsk93 commented 1 year ago

Anyone else having issues scraping instagram recently? I scraped without issues up until a few weeks ago, when the credentials I still had saved from a year ago stopped working. I learned that login is broken in gallery-dl now so I just started pasting cookies in my config file. Here's where the problems started. I began getting warnings on my account and had to solve captchas. I even raised the sleep intervals to [16, 27] seconds but it didn't help. After a couple of warnings they suspended my account, and demanded I verify my phone again, and also send them a selfie of myself holding a piece of paper with my username written on it. I was able to successfully fake both verifications with some ingenuity and appear to have fooled them because now after a few days my account works again. I still would rather not repeat this process. Does anyone have any suggestions? I find it weird they got me because I don't just scrape content, but also login and post some random comment or like in order to appear more convincing. Could it be because when logging in they issue new session cookies, but when I scrape I keep using the old ones that are saved in my config? The sleep intervals are pretty high I think, I can't imagine what kind of bot detection methods they could have come up with..

Hrxn commented 1 year ago

You should definitely try to use the same set of cookies, that's for sure..

mikf commented 1 year ago

It might help to use the same user-agent in gallery-dl as when you logged into IG.

brsk93 commented 1 year ago

I'm already using the "browser": "firefox:windows" option. I will try using the user-agent one, and try using the same cookies, and increase the sleep time between requests even more.

mikf commented 1 year ago

Setting browser to anything Firefox forces your user-agent header to be FF102 (or FF115 since 1.25.8), which is rather old and might trigger IG's bot detection.

Other sites like Danbooru blocked any API requests with a FF102 user agent as that was apparently very commonly used by bots.

brsk93 commented 1 year ago

Ah, so if I already use browser I don't need to set user-agent. Thanks for the info.

Hrxn commented 1 year ago

I think the suggestion is to use a matching user-agent instead of browser?

Infinitay commented 1 year ago

Out of curiosity, why is it that extractor.*.user-agent isn't using browser by default rather than a static user agent?

vv211 commented 1 year ago

questions re: command line arguments:

are command-line arguments processed in addition to arguments in a config file? and if so, do command line arguments override the same arguments in a config file?

Hrxn commented 1 year ago

Yes to both, options set on the command-line are used in addition to the configuration file, and if you set the same options in the configuration file and on the command-line, the command-line takes precedence.

Hrxn commented 1 year ago

@mikf

I just had a small epiphany of sorts..

https://github.com/mikf/gallery-dl/blob/1baf83a9e52c2b1d09b32d5e115aebf8e5f2d279/gallery_dl/extractor/twitter.py#L583-L587

Isn't that a potential conflict of key names? When both get used at the category level, i.e. inside of "extractor.twitter"?

bristolg21 commented 1 year ago

I've been working on a GUI for this for a while now, and think it would be so awesome if I could have a command for gallery-dl that would return the number of files that will be downloaded from a provided link, and maybe a command that would allow retrieving the progress of the download.

mikf commented 1 year ago

@Infinitay mostly because the browser option was added a lot later than user-agent and was initially only meant as a means to access Patreon.

@Hrxn You are correct, but I guess this kind of works without errors as replies is only checked as a boolean or compared to a string.

@bristolg21 getting the number of files is generally not possible. (https://github.com/mikf/gallery-dl/discussions/4209) You could try --filter "print(count) or abort()" which works for some galleries. As for download progress, you could set downloader.progress to 0 to get progress output to stderr immediately when a download starts. (-o downloader.progress=0)

514guy commented 1 year ago

May I ask why gallery-dl polls all local connected storage before processing anything? I run it from an administrative command prompt (w11), and before starting anything, when run after a restart, local USB and network storage can be heard powering up sequentially, when a gallery-dl command is run, which should not have anything to do with the command given (gallery-dl.exe -o lang=en --chapter-filter , etc...).

mikf commented 1 year ago

@514guy There is no code in gallery-dl that would even remotely do something like this. By default it tries to read its config files and open its cache, but that's it. You could disable these file accesses with --config-ignore -o cache.file= and see if that helps.

Hrxn commented 1 year ago

@mikf Short question, with the "path-restrict" option, as described there, using the "normal" option (a string value with the characters to be replaced) works in the opposite way of the ascii special value (and now also ascii+, I think), because here it's the characters not to be replaced..

I think the description is entirely correct, don't get me wrong, as this difference is hinted at by the regex character set negation (^...) I assume. Question, is there a difference internally on how these special values are handled, or can you simply use any string of characters with this negation?

I've taken a look at path.py. and it does not seem to make any difference, although I might be overlooking something, I don't really trust my Python reading skills..

mikf commented 1 year ago

@Hrxn The value of path-restrict is used in a regex character set [...], so you can invert it by using ^ as first character, which is done for ascii and ascii+. This is also why the docs advice you to (double) backslash escape any of []^-\ as they could have special meaning without (although [ should be fine as is, but better safe then sorry I guess).

As for any difference with ascii etc, there isn't any. When the value of path-restrict is equal to "ascii", it replaces it with "^0-9A-Za-z_." as the actual value, but you could have also specified "^0-9A-Za-z_." as path-restrict to begin with.

https://github.com/mikf/gallery-dl/blob/df5c7ee03ef05ab73a034075102d6674d516236c/gallery_dl/path.py#L99-L103

Both of these settings have exactly the same effect and the same is true for all special values:

{
    "path-restrict": "ascii",
    "path-restrict": "^0-9A-Za-z_."
}

The relevant code that does the actual replacement is in _build_cleanfunc() followed by calls to clean_segment().

I guess there should be a "Use an unescaped ^ as first character to negate this set of characters" or "This is implemented as Regex character set" in the docs.

Hrxn commented 1 year ago

@Hrxn The value of path-restrict is used in a regex character set [...], so you can invert it by using ^ as first character, which is done for ascii and ascii+. This is also why the docs advice you to (double) backslash escape any of []^-\ as they could have special meaning without (although [ should be fine as is, but better safe then sorry I guess).

[..]

Good to know, thanks!

The relevant code that does the actual replacement is in _build_cleanfunc() followed by calls to clean_segment().

I see, thank you for pointing me in the right direction, appreciate it.

I guess there should be a "Use an unescaped ^ as first character to negate this set of characters" or "This is implemented as Regex character set" in the docs.

Yeah, to be honest, this was my thought here as well. I mean, it's obviously the implication here, but again, not sure what the percentage of actual end users would be that recognize the use of regex here.

Hrxn commented 1 year ago

@mikf I've been toying with some different values for "path-restrict" and I think I've discovered a potential pitfall for what I was trying to achieve.

First of all, based on the ascii+ preset ("^0-9@-[\\]-{ #-)+-.;=!}~") I've removed the whitespace in the set and then changed this range part (+-.) to +-- (shifting the range one index position to the left, basically. I've done this locally here in the Python interpreter with chr() and ord(), and the regex expression itself is working as expected, I think).

But I then realized that this matches and therefore removes all . (i.e. dot/period) of all path segments gallery-dl tests this against, and would consequently also remove the . in any file.ext in general, right?

Is there a potential way around this? I'm guessing probably not, due to how path-restrict is currently implemented. Not sure if such a change would be even feasible, considering it might break something. I mean, I realize that there's one workaround here, I could simply use the R format specifier with R./''/ in every potential field in "filename" and "directory", right?

Maybe that would be the safest way. I would have to run a big search & replace against my config file, but hey, wouldn't be the first time πŸ˜†

What do you think?

mikf commented 1 year ago

But I then realized that this matches and therefore removes all . (i.e. dot/period) of all path segments gallery-dl tests this against, and would consequently also remove the . in any file.ext in general, right?

Yeah, that's one of the many shortcomings of how path-restrict works: it replaces characters in each whole path segment and not just for each metadata field. This also has its upsides, but you cannot use it to replace all . except the last one for example.

Is there a potential way around this?

There is. It does not use path-restrict and is a bit involved, but nothing too bad (in my opinion).

You can use the new python post processor to modify metadata fields in place before they get used to build a path. To do that, create a new .py file and define a function that does whatever modifications you want/need.

For example in ~\gdl-utils.py

def replace_dots(kwdict):
    for key, value in kwdict.items():
        if isinstance(value, str):
            kwdict[key] = value.rstrip(".").replace(".", "_")
        elif isinstance(value, dict):
            replace_dots(value)

This goes through all metadata fields and removes . at the end / replaces them with _.

To use it, add the following post processor to your config and enable it with -P replace-dots or by adding it to a postprocessors list.

    "postprocessor": {
        "replace-dots": {
            "name": "python",
            "event": "prepare",
            "function": "~/gdl-utils.py:replace_dots"
        }
    }

If you want it to anything more specific, just ask and I can write the code.

Hrxn commented 1 year ago

Ah, interesting, haven't thought of that yet..

You mean by adding this example "replace-dots" postprocessor above into the "postprocessors: [ ... ] " list directly inside of "extractor: {}" - without setting any "whitelist" or "blacklist" in this postprocessor, of course - it gets applied automatically to all metadata fields?

Might be a good alternative, although I'd want to test that a bit first.

To be honest, my concern about changing "filename" and "directory" values in my config again was vastly overblown. It was a lot easier and faster than I initially expected, contrary to my estimation because of the many changes I've made to my personal file naming preference etc. so far. I was never quite happy with it and changed it often in the past, instead of making a decision and sticking to it instead..

What I'm using is basically this now, for all fields that are a title, or name or something and not an ID:

"'_reddit_' in locals()": "{_reddit_[title][:180]!t:R./''/}.{_reddit_[id]}{num:?.//>04}.{extension}"

BTW, would R.// work as well? Or do I have to use the '' string literal?

Come to think of it, instead of me asking all these silly questions, it would maybe be easier if one could test the results of all those string formatting exercises directly in the Python interpreter. Not sure what would be necessary to facilitate that?

So far, I've found that after importing gallery-dl I can use the method you mentioned earlier

>>> gallery_dl.path.PathFormat._build_cleanfunc()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: PathFormat._build_cleanfunc() missing 2 required positional arguments: 'chars' and 'repl'
>>>

Not sure if that alone helps, though...

Speaking of testing, is it normal that I don't get anything returned if I try to call the replace_dots example function from above?

Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>> def replace_dots(kwdict):
...     for key, value in kwdict.items():
...         if isinstance(value, str):
...             kwdict[key] = value.rstrip(".").replace(".", "_")
...         elif isinstance(value, dict):
...             replace_dots(value)
...
>>>
>>> replace_dots
<function replace_dots at 0x000001881D32C4A0>
>>> replace_dots()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: replace_dots() missing 1 required positional argument: 'kwdict'
>>> replace_dots({"key": "value"})
>>> replace_dots({"key": "value.test"})
>>> replace_dots({"key.test": "value.test"})
greychickens commented 1 year ago

Is there a way to achieve the following filename format for Twitter?

[{YY-MM-DD}] {tweet_id}_p{num}.{extension}, but {num} starting from 0.

Using a special type format string seems to be the only way to achieve the index number from 0, but doing so does not allow me to parse {date} with the methods I've tried. Here is the code that I currently have.

"filename": "\fE '[{}] {}_p{}.{}'.format(date:O/%y-%m-%d, tweet_id, num-1, extension)"

Hrxn commented 1 year ago

Well there's also {count}, but it could be that it starts from 0 as well..

Otherwise, using the f-string variant should also work instead, you don't need the \fE Python expression string, AFAIK.

"filename": "\fF {date} {tweet_id}_p{num-1:>02}.{extension}"
greychickens commented 1 year ago

Thanks for the input! {count} does indeed start from 1 as well, but \fF expression at least allows me to parse the date, which is a huge improvement. Unfortunately it still doesn't take UTC offset and just prints out the string.

"filename": "\fF [{date:O/%y-%m-%d}] {tweet_id}_p{num-1}.{extension}"

Output: [O_YY-MM-DD] Tweet ID_p0.jpg

I'll see if there is any other workaround, but if anyone has a solution please share it with me. Thanks!

mikf commented 1 year ago

@greychickens O only works for regular gdl format strings. For anything else you'd have to apply this offset yourself by adding a timedelta to date, although there is no convenient way to automatically get your local UTC offset.

"filename": "\fF [{date+timedelta(0, 3600):%y-%m-%d}] {tweet_id}_p{num-1}.{extension}"

@Hrxn

it gets applied automatically to all metadata fields?

It does with how I've written this function, but you could easily change that.

BTW, would R.// work as well? Or do I have to use the '' string literal?

Using '' would replace all . with two '' as those values get taken as is. To replace them with an empty string, you'd have to use R.//.

Not sure what would be necessary to facilitate that?

I usually disable any "output" like archive, download, etc and run gallery-dl with some "test" URL like imgur.com/asdqwe.

So far, I've found that after importing gallery-dl I can use the method you mentioned earlier ...

It needs two arguments: the characters to be replaced, i.e. path-restrict, and the replacement character(s).

>>> func = gallery_dl.path.PathFormat._build_cleanfunc("^0-9@-[\\]-{ #-)+-.;=!}~", "_")
>>> func("Foo/Bar")
'Foo_Bar'

Speaking of testing, is it normal that I don't get anything returned if I try to call the replace_dots example function from above?

The dict gets modified in-place.

>>> data = {"key": "value.test"}
>>> replace_dots(data)
>>> data
{'key': 'value_test'}
Hrxn commented 1 year ago

@Hrxn

it gets applied automatically to all metadata fields?

It does with how I've written this function, but you could easily change that.

You mean by the usual post-processor options in your config? Or by actually making changes to the function itself?

BTW, would R.// work as well? Or do I have to use the '' string literal?

Using '' would replace all . with two '' as those values get taken as is. To replace them with an empty string, you'd have to use R.//.

D'oh, obviously. I was writing this comment out of memory, I should've taken a look into my own config file. I was actually using R.//. Explains why this was working while downloading recently.. πŸ˜„

Not sure what would be necessary to facilitate that?

I usually disable any "output" like archive, download, etc and run gallery-dl with some "test" URL like imgur.com/asdqwe.

Yeah, I mean that's what I was doing as well. It's just a bit of a pain in the ass, as I was scrolling through reddit, looking for postings with Unicode symbols and periods in their title. Turns out this can take quite some time.. πŸ˜†

But it's all good now, knowing what works!

So far, I've found that after importing gallery-dl I can use the method you mentioned earlier ...

It needs two arguments: the characters to be replaced, i.e. path-restrict, and the replacement character(s).

>>> func = gallery_dl.path.PathFormat._build_cleanfunc("^0-9@-[\\]-{ #-)+-.;=!}~", "_")
>>> func("Foo/Bar")
'Foo_Bar'

Nice, good to know. Example is working the same here for me, good news. But is this normal?

Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import gallery_dl
>>> func = gallery_dl.path.PathFormat._build_cleanfunc("^0-9@-[\\]-{ #-)+-.;=!}~", "_")
>>> func("Foo/Bar")
'Foo_Bar'
>>> func = gallery_dl.path.PathFormat._build_cleanfunc("ascii+", "_")
>>> func("Foo/Bar")
'Foo/B_r'
>>>



Speaking of testing, is it normal that I don't get anything returned if I try to call the replace_dots example function from above?

The dict gets modified in-place.

>>> data = {"key": "value.test"}
>>> replace_dots(data)
>>> data
{'key': 'value_test'}

Roger roger

>>> replace_dots(data)
>>> data
{'key.ext': 'value_ext'}
>>> data = {"key.ext": "test.value.ext"}
>>> replace_dots(data)
>>> data
{'key.ext': 'test_value_ext'}
>>>

Seems I was mistaken in the assumption that the Python interpreter works as a kind of REPL where every expression immediately returns output.. I mean it does for normal basic arithmetic operators apparently, but not for functions. Is that the distinction? So you always have to explicitly return output in Python, like this?

>>> def myadder (a, b):
...   return a+b
...
>>> myadder
<function myadder at 0x000001D24E37C540>
>>> myadder(3,4)
7



Somewhat unrelated question:

Special format specifiers can be chained together (like the example mentioned in formatting.md: {foo:?//RF/B/Ro/e/> 10}), but how would this work for conversion specifiers? Like, if I would want to combine trimming and capitalizing each word? i.e. {title!t} and {title!C} together?

fireattack commented 1 year ago

How do I make -o skip=abort work only with records in download archive database, but not existing file (which should also skip but not abort)?

Hrxn commented 1 year ago

You mean normal downloads archive, not metadata archive or something?

mikf commented 1 year ago

@fireattack It is currently not possible to distinguish between "skipped due to archive" and "skipped because file exists". Both are handled the same way and both count towards -o skip=abort.

@Hrxn

You mean by the usual post-processor options in your config? Or by actually making changes to the function itself?

To the function itself.

But is this normal?

It is. The "ascii+" -> "^0-9@-[\\]-{ #-)+-.;=!}~" replacement happens outside of _build_cleanfunc()

Seems I was mistaken in the assumption that the Python interpreter works as a kind of REPL where every expression immediately returns output.. So you always have to explicitly return output in Python, like this?

Python functions implicitly return None when there's no return statement and the interactive prompt shows no output for None.

but how would this work for conversion specifiers?

It doesn't. You are limited to only one conversion per replacement field. _string.formatter_parser(), the C function provided by the standard library to parse regular format strings, complains when there's more than one.

hunter-gatherer8 commented 1 year ago

I was looking to implement a bunch of new extractors for gallery-dl and expected to be mostly re-using low-level functionality I usually have in a grabber, but was surprised to find out that gallery-dl currently doesn't seem to have any HTML parser. If I'm not missing anything, if an extractor doesn't have the luxury of dealing with JSON API, it ends up making things done by using text.extr util, i.e. simply by searching for a substring. And while you can probably achieve your goal just with that much, it is very cumbersome and honestly isn't something I ever did when writing a grabber. Usually I would parse the HTML and use CSS selectors or such to make sense of page's contents.

So I wanted to ask what's the maintainer's overall position on that? Would it be ok to add BeautifulSoup or lxml to requirements.txt to be able to do that?

Hrxn commented 1 year ago

@mikf

Okay, I've made the "involved (but not too bad)" changes in using the new python post-processor you've suggested. πŸ˜‰

The function so far:

def sanitize_metadata(kwdict):
    for key, value in kwdict.items():
        if isinstance(value, str):
            kwdict[key] = value.strip().rstrip('.').replace('.', '_').replace(' ', '_')
        elif isinstance(value, dict):
            sanitize_metadata(value)

~Edit: For anyone stumbling upon this: Don't use this example here~

(removed)

Let me know what you think (and it really wasn't too bad...) πŸ˜„

Also, what is the reason you've used .rstrip('.') in your example? Is there a reason behind that, do you want to keep leading dots for example? πŸ€”

greychickens commented 1 year ago

During the creation of ugoira webm using ffmpeg, is there a way to preserve resolution information to output metadata? Right now the only field being recorded is video length, and an inaccurate 0kbps for bit transfer rate field.

"ffmpeg-args": ["-hide_banner", "-c:v", "libvpx-vp9", "-lossless", "1", "-pix_fmt", "yuv420p", "-fflags", "+bitexact"]

mikf commented 1 year ago

@hunter-gatherer8 You aren't missing anything. For sites without any form of "API", there really are only a bunch of functions that extract substrings, but 99% of the time that is more than enough and it uses a lot less resources than parsing an entire HTML page and applying CSS selectors would.

It probably takes a bit of time to get used to these functions if you haven't been using them for years like I have, but it is certainly doable. Take a look at the PRs from enduser420 as a good example.

My position on adding an extra requirement like BeautifulSoup to make it easier for other developers: No, not interested. Maybe as an optional requirement and maybe in 2.0, but definitely not as a hard requirement. youtube-dl and yt-dlp also function just fine without such a library.


@Hrxns Look good. You might want to consider using str.translate() and str.maketrans() when getting to 5+ replace() calls.

Also, what is the reason you've used .rstrip('.') in your example? Is there a reason behind that, do you want to keep leading dots for example? πŸ€”

I wanted to completely remove any trailing dots, so that something like {title}.{ext} doesn't result in sentence..jpg when title ends with a .


@greychickens ffprobe shows all the metadata you mentioned. If you mean Windows-specific media metadata fields, then no, there is no built-in support for that.

Hrxn commented 1 year ago

@mikf There were actually two things I've noticed back then when I tested this, but I did not write a reply in this thread here yet because I didn't want to "spam" it more than necessary, and also wanted to wait until you have responded..

That said..

  1. The post-processor setup in the config file does not really work like in the example I gave (I've edited my comment above), the error message returned was not really that specific, but my first hunch that it probably was the name of the Python file was correct in this case. Apparently, you cannot use a filename like gallery-dl.util.functions.py. While gallery-dl-util-functions.py worked as expected. Is this a general thing for Python, or is this Windows specific?

  2. My first tests have all been with normal reddit URLs, and they all worked just as before, but then I've encountered the first real issue here with a reddit gallery (a single reddit post submission with a https://www.reddit.com/gallery/XXXXX URL) Ironically, the first image of the gallery still worked, only the subsequent ones failed with network errors.

    [2023-08-26T12:19:38][warning] HTTPSConnectionPool(host='preview_redd_it', port=443): Max retries exceeded with url: /8ubsspxpd8jb1_jpg (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x00000166A8DA42D0>: Failed to resolve 'preview_redd_it' ([Errno 11001] getaddrinfo failed)")) (4/16) 

Luckily, with the hostname being preview_redd_it here, it was immediately obvious who the culprit was. πŸ˜„

I guess this here is the case now:

it gets applied automatically to all metadata fields?

It does with how I've written this function, but you could easily change that.

You mean by the usual post-processor options in your config? Or by actually making changes to the function itself?

To the function itself.

I'm curious what that would look like, actually.


I wanted to completely remove any trailing dots, so that something like {title}.{ext} doesn't result in sentence..jpg when title ends with a .

I've understood this so far, but you used rstrip instead of strip in your example because you actually want to keep leading dots? I just wonder what the reasoning is behind this?

@Hrxns Look good. You might want to consider using str.translate() and str.maketrans() when getting to 5+ replace() calls.

Interesting stuff, but not sure how to use it. This is a thing with the Python docs, in my opinion, on one hand, they are very detailed, technically accurate, using all the correct computer science terminology, but on the other hand, they seem to miss some practical examples on how actual usage would look like. Dunno, you're the expert here, it's probably not really a difference as long as we're only at two replace() calls..

fireattack commented 1 year ago
value.strip().rstrip('.').translate(str.maketrans(". ", "__"))

I do agree it's not much better than just two replace.

Noonereallycomeon commented 11 months ago

Does instagram determine resolution of the video supplied because when i downloaded the same reel the earlier one was 1080p and the newer 720p, can something be done about it ? does the data saver setting in the app affect it? this is one example reel https://www.instagram.com/reel/Cs9HX-OA1te/ could someone check if you can get the 1080p version.

ghbook commented 11 months ago

Hi @mikf Is it possible to set "filesize-max": "10M" per extractor differently in config, instead of globally in downloader?

JSouthGB commented 11 months ago

Hi @mikf Is it possible to set "filesize-max": "10M" per extractor differently in config, instead of globally in downloader?

Seems to be a downloader only option for now.

omfgntsa commented 11 months ago

Hi, I would like to know if it is possible to have patreon extractor use {creator[vanity]} first and if not available to use {creator[full_name]} instead.

I have my config set up where each patreon will have their own folder "directory": ["{category}", "{creator[id]}-[{creator[vanity]}]"] so the folder will be name ID-[NAME] Some creators do not have a vanity name setup yet and in those cases it'll create a folder ID-[None] If those cases I would then rather use the fullname instead of just None.

If so can you show me how I should rewrite the directory part of the extractor in config.json?

Thank you.

Twi-Hard commented 11 months ago
{creator[vanity]|creator[full_name]}
"directory": ["{category}", "{creator[id]}-[{creator[vanity]|creator[full_name]}]"]
anonymous721 commented 10 months ago

When grabbing a Nijie post, is there a way to check whether there are multiple images, analogous to Pixiv's page_count, so that I can put only multi-image posts in a subdirectory? When I checked a URL with -K I didn't see anything. There's a num keyword for individual images, but an approach based on that would miss the first image of each post.

diamondsw commented 10 months ago

For Mangadex, it would be nice if gallery-dl had an option to download cover art as well, i.e. https://mangadex.org/title/efb4278c-a761-406b-9d69-19603c5e4c8b/the-100-girlfriends-who-really-really-really-really-really-love-you?tab=art . This would probably require a flag to enable as well as a separate destination directory, as the default contains variables that no longer make sense (volume, chapter, etc).