mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.46k stars 938 forks source link

[deviantart] Favorites extraction issue, Literature and Journals #14

Closed Hrxn closed 7 years ago

Hrxn commented 7 years ago

I just noticed something while downloading some DeviantArt profiles as well as their favorites.

First the good news, given these URLs

[1] http://roperookie.deviantart.com/
[2] http://michaelpe.deviantart.com/
[3] http://rick35mm.deviantart.com/
[4] http://ultradevious.deviantart.com/
[5] http://sanfrancysco.deviantart.com/
[6] http://hart-worx.deviantart.com/

gallery-dl downloaded each and every deviation/submission from the profiles' galleries. 100/100 Points.

But Favorites are entirely another story:

Profile 1: Profile web page states: 1,316 Favourites. In 'favorite\roperookie - Featured': 348

Profile 2: Profile web page states: 92 Favourites. In 'favorite\michaelpe - Featured': 3

Profile 3: Profile web page states: 7 Favourites. In 'favorite\rick35mm - Featured': 4

Profile 4: Profile web page states: 382 Favourites. In 'favorite\ultradevious - Featured': 284

Profile 5: Profile web page states: 48 Favourites. In 'favorite\sanfrancysco - Featured': 34

Profile 6: Profile web page states: 547 Favourites. In 'favorite\hart-worx - Featured': 333

That some pretty wild variation there. I think that two things need to be taken into account:

Some of these profiles are pretty old, so the numbers of favorites mentioned on the profile pages have to be taken with a grain of salt, maybe.

My first assumption was that this might be caused by the at least two different favorite listings that each profile has. Featured and All.

The name of the target directory indicates that: "{profile-name} - Featured"

Also, the corresponding URLs:

Favorites tab in the horizontal category menu bar: http://sanfrancysco.deviantart.com/favourites/ Featured view at the left, under the profile badge: http://sanfrancysco.deviantart.com/favourites/ All favorites view link below: http://sanfrancysco.deviantart.com/favourites/?catpath=/

This does not affect the result, gallery-dl always returns the same files for these.

Logic indicates that Featured is a subset of All, which is sometimes true (http://sanfrancysco.deviantart.com/), sometimes not (http://rick35mm.deviantart.com/).

The last one is also an example for the Favorites chaos, the number listed in the profile being totally off.

Not sure what to make of it. Would be interesting to know what the API returns.

mikf commented 7 years ago

The issue concerning the favorite-inconsistencies is actually quite simple: I didn't turn the mature-content filter off. This wasn't necessary for the /gallery/all API endpoint, which is used to get a users submitted deviations, but it seems to be needed to get all of a users favorites via the /collections/{folderid} API endpoint.

Would be interesting to know what the API returns.

I first use /collections/folders to get a list of all favorite-folders, which includes the "Featured" listing, but not the one for "All". This result is then used with /collections/{folderid} to get a list of all deviation-objects. You can even try this out yourself. There doesn't seem to be any way to get the "All" listing via API. For most users this is probably just the total of all the collection-listings, but http://rick35mm.deviantart.com/favourites/?catpath=/ only has 4 items, whereas there are 44 in the "Featured" listing.

Another note: ../favourites/ and ../favourites/?catpath=/ both get currently treated as one and the same and result in the "Featured" listing being downloaded.

mikf commented 7 years ago

To add some numbers (now with mature content turned on):

   # | before | now | expected
   [1]    348   901   1316
   [2]      3    81     92
   [3]      4    44      7
   [4]    284   368    382
   [5]     34    42     48
   [6]    333   466    547

These are only from the "Featured" listing and don't include journal entries.

I've manually counted all the entries shown for [2] and there are only 81 (+ 1 journal); the same number as the API provides.

Hrxn commented 7 years ago

Yup, some of those numbers for favorites on the profile can't be right...

It probably is caused by old deviations that don't exist anymore, while the count doesn't get updated. I mean that phenomenon is something you will notice on a lot of different sites.

Also interesting that one API endpoint has a filter setting for 'mature' content, while another endpoint, same site, same content, doesn't have such an option. So much for consistency πŸ˜„ .

I'll do a rerun as well and report back if any new issues should pop up.

I'm not sure what do to about journal entries. I admit that I personally did not care about them so far, so this would not be an issue for me. But it seems that might be relevant for other users, so they probably would appreciate this. Don't know, for poetry and stuff like that. There even is a Literature section in the category menu. Oh, and those literature entries don't necessarily have to be in the journal, they can be listed in the "normal" gallery as well: Example: http://greystream.deviantart.com/gallery/

So, I would assume that they should be available via the API just like the other stuff. If that is the case, I'd suggest to add this feature as well, including Journal entries, if these can also be reached from the API in a straightforward way.

Anyway, thank you for looking into that!

Hrxn commented 7 years ago

I tried another run, and basically had the same numbers for favorites you posted earlier. A bit more in some cases, guess these accounts were not completely inactive and added some favorites in the meanwhile.

Gallery with Literature works as well.

Journal entries also, one HTML document for an entry. Some Journals have embedded pictures, img elements referring to content hosted on DeviantArt elsewhere. So, that probably won't work without an Internet connection πŸ˜„

Although the HTML contains the correct URL, obviously, so that is what the Single Deviation extractor would use anyway.

But okay, as I said, not a priority. Someone else should come here and complain eventually, doesn't have to be me πŸ˜‰

So I will close this issue so far, being only about favorites in the first place that's another issue technically, so everything done here.

Really, thank you for your help!

mikf commented 7 years ago

A bit more in some cases

Favorites can include all sorts of media-types and up to that point gallery-dl only looked at images. With e5f79ae83988c6fbc9089e5d14de41ae2c8b4771 it also takes video, flash and journals into consideration, which might up the numbers as well. There are also cases where a deviation contains the same resource in multiple formats. http://dummy88.deviantart.com/art/Kakuzus-sweet-obsession-151848118 for example has a video and a flash animation and both are downloaded.

So, that probably won't work without an Internet connection πŸ˜„

Yeah, that is a bit of a downside to the current approach, but the journal HTML files need several CSS documents from the deviantart servers to look the part and that is obviously not possible without internet connection either. I don't think that I'm allowed to distribute these, so you either have a working internet connection or the journals are missing a bit of content.

Although the HTML contains the correct URL, obviously, so that is what the Single Deviation extractor would use anyway.

If you weren't aware if it: there is actually a convenient way to apply gallery-dl to URLs found in local (or remote) files: $ gallery-dl r:file://<JournalFile> (r: is the short form of recursive:)