mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.09k stars 982 forks source link

How do I use gallery-dl programmatically? #1375

Closed FelixKainz closed 3 years ago

FelixKainz commented 3 years ago

Hello dear Open Sourcers!

I am currently creating an app with Python that uses gallery-dl in order to download ugoiras (animations from pixiv.com). So far I used the CLI interface of gallery-dl via subprocess.run() which works but poses issues with cross-platform compatibility and executable generation using PyInstaller.

Anyway, since I am kind of inexperienced with bigger Python programs such as gallery-dl, I am having a hard time figuring out how I am supposed to interface with gallery-dl programmatically.

What I am currently doing via CLI is

  1. guide the user to enter the token for the OAuth authentification process, which is done with via CLI with gallery-dl oauth:pixiv.
  2. get metadata such as framerate and artist of the ugoira based on its URL.
  3. download the actual zip containing the frames of the ugoira.

I think for step 2 (and 3?) I need to use a PixivWorkExtractor from gallery_dl.extractor.pixiv.py. I think I know how to use that one, but first I need to undertand the step 1. I see that there is a gallery_dl.postprocessor.oauth.py module. The OAuth1Client sounds interesting. But it requires some keys which I do not have. Or do I? def __init__(self, consumer_key, consumer_secret, token=None, token_secret=None):

So, how do I go through the process of authentication via OAuth programmatically? Thank you so much for your help!

FelixKainz commented 3 years ago

I focussed on downloading the metadata and zip for now and set the token manually via CLI.

I managed to download the zip of the ugoira, but not in the best available quality, by

URL = 'https://www.pixiv.net/en/artworks/65107599'

work_extr = PixivWorkExtractor(re.match(PixivWorkExtractor.pattern, URL))

illust_meta = work_extr.works()  # gets metadata about ugoira
ugoira_meta = work_extr.api.ugoira_metadata(illust_meta[0]['id'])  # contains ugoira-specific metadata, eg. zip URLs

res= work_extr.request(ugoira_meta['zip_urls']['medium'])  # contains zip in content field
with open('frames.zip', 'wb') as f:
    f.write(res.content)

However, as I said, the zip is only 5 MB in size whereas the full one should be 12 MB. ugoira_meta = work_extr.api.ugoira_metadata(illust_meta[0]['id']) only gives the 'medium' option, but if I simply replace the 600x600 in the medium link with 1920x1080, it works. But does every usgoira have 1920x1080 in it's filename if it is the highest resolution version?

How do I get the best link for a given ugoira?

mikf commented 3 years ago

How do I get the best link for a given ugoira?

By manually replacing 600x600 with 1920x1080

https://github.com/mikf/gallery-dl/blob/10c279f2856139504ee7aa5c0298a7b5497c32de/gallery_dl/extractor/pixiv.py#L60-L64

And you can simplify

work_extr = PixivWorkExtractor(re.match(PixivWorkExtractor.pattern, URL))

to

work_extr = PixivWorkExtractor.from_url(URL)

I see that there is a gallery_dl.postprocessor.oauth.py module. The OAuth1Client sounds interesting. But it requires some keys which I do not have. Or do I?

Pixiv uses OAuth 2.0, not 1.0a, but there is some special code necessary compared to other sites which also use 2.0 since we can't set a custom redirect URI. The code for the Pixiv OAuth process can be found at https://github.com/mikf/gallery-dl/blob/10c279f2856139504ee7aa5c0298a7b5497c32de/gallery_dl/extractor/oauth.py#L361

If you want to run that in a Python script, do something like

from gallery_dl.job import DownloadJob
DownloadJob("oauth:pixiv").run()

# or

from gallery_dl.extractor.oauth import OAuthPixiv
for _ in OAuthPixiv.from_url("oauth:pixiv"):
    pass
FelixKainz commented 3 years ago

Thank you so much! I just spent the day porting my gallery-dl calls to be native ones rather than using subprocess. And so far I got everything working, but I am still not sure how to handle the OAuth process.

If you want to run that in a Python script, do something like

from gallery_dl.job import DownloadJob
DownloadJob("oauth:pixiv").run()

# or

from gallery_dl.extractor.oauth import OAuthPixiv
for _ in OAuthPixiv.from_url("oauth:pixiv"):
    pass

So, I see that both of these methods block while they are waiting for input through the console. That wouldn't be a problem in a console application but my application is not a console application. Is there any non-blocking way I can take to go through the OAuth-authentification process? So that I can initiate it, which opens the browser tab to login, then unblocks and then resume somehow when the user entered the token. That's how I did it so far via CLI. I used a subprocess.Popen instance to call gallery-dl oauth:pixiv while waiting for input, the constructor Popen() unblocked so the user can enter the token and then I tell gallery-dl about it by calling subprocess.Popen.communicate(token) in that instance. Is there a way I can do it like this with native calls to gallery-dl?

An alternative would be to spawn a console that would simply run the above and block my app as long as it runs. But I am not sure how I would do this without subprocess.run((gallery-dl, oauth:pixiv)). Is there a way to spawn a new python interpreter without having to locate a perhaps not existing interpreter on the user's system? That could happen since I want to deliver my app as an executable that does not require a python runtime preinstalled on the user's system.

mikf commented 3 years ago

Is there any non-blocking way I can take to go through the OAuth-authentification process?

Not natively. The blocking call is in OAuthPixiv._input(), so maybe you monkey-patch that and replace it with a non-blocking version, or you assign a custom object to sys.stdin, which input() reads from, and emulate non-blocking user input that way.

You could also copy the entire code from OAuthPixiv.items() and modify it to fit your purposes. It would allow you to better control the call to self.open(), among other things.

Is there a way to spawn a new python interpreter without having to locate a perhaps not existing interpreter on the user's system

I don't know how. Never had to do that, but it should theoretically be possible.

FelixKainz commented 3 years ago

or you assign a custom object to sys.stdin

I tried exactly that yesterday, but failed. So I sat down today again, and got it to work! Now my whole program uses native calls to gallery-dl. No more fiddling around with subprocesses!

Dank Dir nochmal für deine Hilfe! <3