mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[flickr] Complete extractor support #16

Closed Hrxn closed 7 years ago

Hrxn commented 7 years ago

Good news from the commit log, flickr album extractor (https://github.com/mikf/gallery-dl/commit/93e5d8cba3ef39455cea87d92cdf4c2f01fbe0ca), and image extractor (https://github.com/mikf/gallery-dl/commit/659c65dbb06e71bac6c0fa5f5f8b1b7a53d84720) are already there, apparently.

So I thought that adding complete support for Flickr now might be a good idea. 😄

Support for the Flickr API being part of the implementation already, at least that's what I got from reading the source code a bit, it should be relatively straightforward now, I would assume.

All Flickr variants, I think, are:

So what do you think? Unless I'm wrong about the Flickr API, you already made the foundation, so it should not be too much work, I'd assume.

mikf commented 7 years ago

There are even more things to add to the list:

They might not be as important as the two missing variants you listed, but still necessary to call it "complete".

Supporting these shouldn't be much of a problem, as there is an API method for all of them (API Method List). The biggest issue might be support for NSFW and private images, which sometimes requires User Authentication. The best example for this might be the people.getPhotos method:

This method must be authenticated (Please note: Un-authed calls can only see Safe content.)

Hrxn commented 7 years ago

Interesting, I wasn't aware of this difference between Album and Gallery. Group support is also a good idea. I'm not so sure about the general search, might be nice to have, probably depends on how accurate you actually find what you're looking for, so you can avoid results that are too generic. Well, except if that's what you want, a generic result for a specific keyword/tag, then yes. I don't see the potential use case here right now, but okay, that's just me, your mileage may vary.

But yeah, support for user authentication is definitely important, I'd say. Or is the Flickr API here somehow different compared to the normal web usage? On the user side, or rather on gallery-dl's side, we already have user authentication for sites like pixiv, exhentai (BTW, this was a question from the other thread, I've tried my new test account in the meantime, everything seemed to work as expected, thanks again 🎉 ) etc., so I guess for Flickr it's best handled in exactly the same way.

This is how the "safety" options look like on the site, btw: 2017-06-05_100453

I guess it is not really well know, but Flickr had pornographic content on the site since its inception.

Oh, and there are two specific oddities, which I maybe should mention here.

  1. Flickr was bought by Yahoo a long time ago, and they made the very wise decision to enforce the use of a Yahoo mail account. Well, just checked their sign-up again, and they apparently made some changes. Since January of this year, according to some Android news site, you can now create an account with any mail address again. Or, to be exact, getting a Flickr account still gets you the Yahoo ID as well, but you can now use other mail accounts for this. Thank god. They finally realized they were slowly killing the site, I guess.
  2. I'm absolutely certain that Flickr had a geofence for their "safety" settings, meaning that you couldn't change the filter settings for your account to all normally possible options if you signed up for Flickr from some specific countries, for example Malaysia, Germany and some (few) others I no longer remember. But since they changed their mail "requirements", this could also have changed in the intervening time.
mikf commented 7 years ago

Or is the Flickr API here somehow different compared to the normal web usage?

Yes, it is. Like many APIs, Flickr uses OAuth for authentication purposes. It was usually enough to just authenticate the application itself and send requests on the application's behalf (e,g, DeviantArt). In this case, the user has to grant the application permission to send requests on his behalf, which gets a bit more complicated. To compare the two:

Application Auth:

User Auth:

-> the whole thing in one image

Hrxn commented 7 years ago

Okay, got that. Looks a bit over the top, with unnecessary complexity, but they are probably doing it for "security" reasons.

Silly me, thinking that just providing your credentials, just the way you would in a browser, should be enough.

Hrxn commented 7 years ago

[This was an offtopic question]

mikf commented 7 years ago

User authentication has (finally) been implemented. It is on the surface the same as for reddit:

Please test this to see if everything works on your end as well.

To answer your question: The underlying software for Danbooru and Gelbooru is quite different, and so are their search capabilities. Danbooru and all the other ...booru sites with a JSON API interface allow you to search for pool:<pool-id>. Gelbooru and all the others with an XML API interface don't seem to have that functionality or at least I haven't found out how.

Hrxn commented 7 years ago

Glad to hear, I will test Flickr Auth when I'm home again. I'm also searching some old archived bookmarks of mine for some Flickr profile links, to put together a list for doing a run I'll let work over night..

[flickr] add favorites extractor (#16) b81d068 [flickr] add user extractor (#16) 4e80e0c

User extractor is this "Photostream", right? Together with the other commit, this should mean that both Album and Gallery in Flickr parlance are implemented now, if I'm not mistaken.

I update the checklist in the initial post. This means Flickr support is now almost complete, right? With the exception of search results..

Oh, and that other question:

My bad, this is the wrong issue, I'm sorry. This was supposed to be in the General Questions etc., #11 I will continue with the answer there, to keep this on topic.

mikf commented 7 years ago

User extractor is this "Photostream", right?

Yes, it is. I've just named this one "user" to be consistent with all the other extractors which do the same general thing - getting images from the users own list.

What might not work and could even break the extraction process at the moment are videos. I haven't found an example, so who knows what will happen if you stumble over one.

edit: found some videos. The publicly available ones seem to work, the r18 ones give a 404 error when trying to download them, which is the same behavior as when using a browser. You seem to only be allowed to download r18 videos when being logged in, but there aren't any cookies or HTTP headers when using OAuth, so I don't really know how to handle this with gallery-dl.

Hrxn commented 7 years ago

Okay, was a bit short on time unfortunately, but here are some results:

gallery-dl oauth:flickr works as expected, but noticed a minor potential issue with webbrowser.open, I think. Immediately opened the Flickr/Yahoo login page with the default OS browser, which I usually never bother to change. If I close it directly, gallery-dl keeps waiting for a response, but Ctrl+c to cancel didn't work for me here (tested in CMD and Powershell). Personally, I don't mind selecting text from the terminal, I don't think this is too bad on Windows 😉 CMD: Ctrl+M, select with cursor, hit Enter. PS: Directly select with cursor, hit Enter. (The usual keyboard shortcuts (Select with Ctrl, etc.) work as well).

But okay, that's not the point, and it works as is right now. Clicked Accept in the Browser, got the key data, saved it into gallery-dl.conf, worked fine!

Downloaded some images, just as expected. My plan to make a bigger test run had to be postponed, because hot damn your average Flickr profile with a couple MP per picture is easily a couple of GB. Need to make some space first 😄

This could be a good idea to make some optional changes, eventually. I didn't read the source yet, and I'm definitely not proficient in Python, but I assume the default setting is to get the 'original' size of the pictures, right?

Normally I'm a staunch advocate of always (and only) getting the highest resolution possible, or the "original" resolution, but Flickr might be overkill, at least to some users, who don't really want ~ 5000*4000 px per image (for some profiles only, though). So this could eventually be an option to set..

What I don't understand right now, is that 'r18' videos don't work apparently, but 'r18' pictures do work?

Seems strange to me that the API handles this stuff so differently. Not sure, using headers is usually not that much of a problem with other CLI programs (curl, wget, ...). I think something along the lines should be possible with Python or the Requests library, don't know.

Maybe like this: http://docs.python-requests.org/en/master/user/advanced/#request-and-response-objects

Edit:

Or this.. https://stackoverflow.com/questions/6260457/using-headers-with-the-python-requests-librarys-get-method

mikf commented 7 years ago

Thank you for your detailed feedback.

I was quite surprised to learn that Ctrl+c doesn't work in that particular instance, because it works just fine in all other situations on Windows. It seems that Python on Windows ignores all external input during blocking socket operations. I built some sort of workaround by adding a timeout, which seems to work (d60781de7b9fc7169f735f8c49efb63e416c7254).

There is also no way for gallery-dl to know about what your browser is doing, which would make this a lot easier. It can only sit there and wait until you click "Accept" and your browser gets redirected to localhost:6414. gallery-dl is just going to wait indefinitely If this never happens, which is why the "Ctrl+c to cancel" notice is there.

Personally, I don't mind selecting text from the terminal

The one time I tried this, I minded it quite a bit, especially compared to how intuitive this is on a terminal emulator on linux (at least for me): click+drag to select, ctrl+shift+c to copy. I expect the average user to be much more comfortable with a browser-tab opening on its own, given that webbrowser.open does the right thing, instead of having to manually select and copy+paste a URL. I guess I'm going to add an option to disable the use of webbrowser.open, but it should stay the default.

I'm a staunch advocate of always (and only) getting the highest resolution possible, or the "original" resolution, but Flickr might be overkill

Same here, and I agree that there should be an option to select different formats than the original, at least for flickr. It would be nice to have a format-selection-option like youtube-dl does, but this might be a bit too much, given that almost all supported sites provide their images in only one size and format.

What I don't understand right now, is that 'r18' videos don't work apparently, but 'r18' pictures do work?

The issue with r18 videos is actually downloading them. Getting metadata and a download URL works the same for images and videos, r18 or not, but sending an HTTP GET request to download any of these results in a 404 status code for r18 videos if you are not logged in.

R18 rated images don't require any authentication once you've got their URL: flickr page --- direct link That is not the case for r18 videos: flickr page --- direct link

The last link is what the API provides as download URL for that video, but accessing it without being logged into Flickr just gives you a "This is not the page you're looking for." message. The thing is that gallery-dl doesn't need to log into Flickr to access its API as it is using OAuth to authenticate itself and the user. The problem here is authenticating gallery-dl and its user when trying to download an r18 video. A browser does that by sending the session cookies it got when logging into Flickr; gallery-dl doesn't have any cookies to send.

edit: I should mention that you cannot use any of the OAuth methods to authenticate an r18 video download. I tried.

Hrxn commented 7 years ago

Okay, I see. These videos require a session cookie for the requests, but gallery-dl doesn't have it because OAuth doesn't deal with cookies..

Not sure if there is any elegant solution for this, I don't see it right now. The only workaround presumably is to add support for reading cookies in text form to gallery-dl from the command-line. This is not specific to Flickr, so this might be useful again in some other cases. I think something along the line of how curl handles this, sending cookies with -b "c=1; d=2", or better -b FILE, which is a "cookie jar" file, not sure if that is really standardized, but curl can create these as well, it should be plain text anyway. The missing step is now to export cookies from the browser. This can be done via Dev Tools in modern browsers, I think, otherwise there are dozens of "Cookie Manager" extensions for Firefox and Chrome which can save cookies to file etc. I don't know which extension is really the best here, but this should be possible to find out.

On the other hand, I'd just say, Videos on Flickr? Meh.. But "complete" is "complete", I suppose. Sigh.

I agree about the issue with Ctrl+c, which normally works in Windows exactly as on Unixoids. So the issue is with the Python runtime for Windows, apparently. I don't know, maybe there already is an open ticket for this, hard to imagine that this hasn't been noticed before.

Hrxn commented 7 years ago

Specify as --width-max PIXELS as Int? Or using the identifiers defined by Flickr?

mikf commented 7 years ago

Currently as Integer/Pixels. I'm not quite sure what kind of "format selection" would be appropriate, so I just went with the simplest one for now. Do you have any suggestions in that regard? Are the Flickr identifiers (o, k, h, l, ...) something that is generally known?

Hrxn commented 7 years ago

Uh, I don't really know. But pretty sure that this is not general knowledge in any way.. No I think the current variant with limiting width as pixels is the best possible solution.

I've made a habit of actually reading the commits here, but I wasn't absolutely sure, to be honest. Some Python constructs still seem weird to me 😉

mikf commented 7 years ago

I've finally managed to implement a way to supply your own cookies (--cookies COOKIEJAR or the cookies option), which solves the R18-video issue.

Flickr only requires the cookie_accid and cookie_epass cookies for the videos to be downloadable. These are valid for two years and shouldn't change unless you re-login.

The cookie jar files have the same restrictions as youtube-dl has for their --cookies option (# HTTP Cookie File in the first line, values separated by tabs \t), but in this case it is probably easier to just put them directly in your config file

"flickr": {
    "cookies": {
        "cookie_accid": ...,
        "cookie_epass": ...
    }
}
Hrxn commented 7 years ago

Great to hear, thanks a lot!

Can't test it for myself right now, cause I'm out and about, can do it when I'm home again, oh well..

I think that's about it for this extractor, right?

Maybe time to close and move on 😄