Open fireattack opened 2 months ago
image_key
and extension
will save that picture as SvCSMjtI_o.jpg.
That's just an example, the point is to get filename just like what wget, browser etc. would get. There is no guarantee it would always be {image_key}_o.{extension}
.
I agree with the idea. But first, we'll have to agree on an approach here.
For example, emulating curl with the curl -J -O
options.
It uses the name from Content-Disposition
if and only if there's a valid Content-Disposition
header, otherwise it takes the filename from the last segment of the URL.
But none of this is foolproof. Content-Disposition
may contain path information etc., it may result in an filename that already exist in the current directory (resulting in an undesired skip (or, even worse, overwrite)). This should be handled. Like, by always stripping what could be considered path names. The header is completely controlled by the server, and there might be encoding/character-set issues etc. so you could end up with percent-encoded filenames that you maybe don't want.
How to handle redirects?
If you use curl -O
, it always takes the filename from the respective segment of the input URL, regardless of redirects.
Ideally, I would preferer we use (in this order) the filename provided by Content-Disposition
if any, the filename in directed URL, and then the filename in original URL. (There could also be filename extension auto-fix by reading Content-Type
header -- like browsers' save as
usually do.)
But I assume our current check for dupe/skip mechanism kicks in entirely before making the actual HTTP request?
If so, yeah indeed that will above things more complicated. And in that case, I think a static filename purely based on original input URL (something as simple as unquote(url.split('?')[0].split('#')[0].split('/')[-1])
) is more than enough.
Currently
filename
works for most sites but not always.For example, imgbox: https://imgbox.com/SvCSMjtI
I want to name the downloaded file as
SvCSMjtI_o.jpg
(from the image direct URL, https://images2.imgbox.com/5d/90/SvCSMjtI_o.jpg), butfilename
keyword is actually a site-specificyuka_s1113-1830971240661819757-20240903220856-03
(the original filename when the uploader uploaded the file) one.URLs with headers like
Content-Disposition: attachment; filename="a_different_name.jpg"
are trickier as both can be considered as reasonable "real" filename, but it does not apply to the above case since it has no such header.Also, a suggestion for the doc:
Provide a list of commonly used keywords in readme.
I understand that you can use
-K
to obtain the full list of keywords, and lots of keywords are site-specific, but there are also lots of common ones (like extension, title, etc.). I think providing a list of these most commonly used ones are useful for a new user. At least, I personally was very confused when I used gallery-dl for the first time and try to find the keyword/field conventions for naming my files.Furthermore, I think the link in "and powerful filenaming capabilities" should go to a document with such keyword name list, together with formatting info which we currently have (basically, similar to yt-dlp's approach:https://github.com/yt-dlp/yt-dlp?tab=readme-ov-file#output-template).