mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[error?][furaffinity] Artist data not achieved #1630

Closed rod-giovany closed 3 years ago

rod-giovany commented 3 years ago

note: i use google translate There are certain image posts that do not show certain information about the artist, nor with the option -K or -j which would be {artist}, {artist_url} and sometimes {user}. Specifically with files without a name or {title}; I do not know if this is how they were originally uploaded or is it a furaffinity error, but "filename" if it shows certain information. example: (nsfw) https://www.furaffinity.net/view/23612402

This is affecting the configuration of "directory" saving the files where it does not correspond.

my configuration would be:

C:\Users\Jonathan>gallery-dl --verbose https://www.furaffinity.net/view/23612402 -E
[gallery-dl][debug] Version 1.17.5
[gallery-dl][debug] Python 3.9.5 - Windows-10-10.0.19043-SP0
[gallery-dl][debug] requests 2.25.1 - urllib3 1.26.4
[gallery-dl][debug] Starting InfoJob for 'https://www.furaffinity.net/view/23612402'
[furaffinity][debug] Using FuraffinityPostExtractor for 'https://www.furaffinity.net/view/23612402'
Category / Subcategory
  "furaffinity" / "post"
Filename format (custom):
  "{title} - {id}.{extension}"
Filename format (default):
  "{id} {title}.{extension}"
Directory format (custom):
  ["{category}", "{artist}", "{subcategory}"]
Directory format (default):
  ["{category}", "{user!l}"]
Archive format (custom):
  " {filename} {id}"
Archive format (default):
  "{id}"

giving me for example: "$HOME\gallery\furaffinity\Burgerkiss\gallery\a_-_35357600.jpg"

but what I get with the example file is: "$HOME\gallery\furaffinity\gallery\_-_23612402.jpg" or "$HOME\gallery\furaffinity\post\_-_23612402.jpg"

in the terminal with the options -K is:

Keywords for filenames and --filter:
------------------------------------
artist

artist_url

category
  furaffinity

and with the metadata file in "mode":"json" and option -j

{
    "artist": "",
    "artist_url": "",
    "category": "furaffinity",
}
Skyofflad commented 3 years ago

This clearly is FA's fault. If you look closely at this (https://www.furaffinity.net/view/23612402) page, you will see, that it has no title at all. This fucks up the gallery extractor. How did this happen, I have no clue. I have extracted hundreds of thousands of images, and this is the first time I see an abnormality like this. Just download this glitchy image manually and forget about this.

rod-giovany commented 3 years ago

there are two other examples (https://www.furaffinity.net/view/23612419 and https://www.furaffinity.net/view/23612468). but I don't understand why the {artist} or {artist_url} data does not appear

mikf commented 3 years ago

but I don't understand why the {artist} or {artist_url} data does not appear

Those two field come from a post's <title> and are empty if said title is missing. Should be fixable by getting this information from somewhere else in the post page.

rautamiekka commented 3 years ago

Try imagining the horror FA's system staff would feel if the title was a factor in how it's stored in the db, instead of the integer they use.

^ I'm sure they store the uploads as files instead of binary blobs in the db, but an UTF-8-using Linux wouldn't care anyways, so this is a weird but interesting problem.

rod-giovany commented 3 years ago

Specifically with files without a name or {title}; I do not know if this is how they were originally uploaded or is it a furaffinity error

so i guess it's both

Wiiplay123 commented 3 years ago

Why not get the artist info from the element that's inside the HTML element with the class="classic-submission-title information"? It contains the title every time, even when the title doesn't.

mikf commented 3 years ago

@rod-giovany you might want to use "{title:?/ - /}{id}.{extension}" as filename format string to account for empty titles. With your current settings they'd come out as - 12345.jpg, with {title:?/ - /} they'd just be 12345.jpg