mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[[Cohost] but also every other website] Stop putting post text in file names, or at least properly escape it #6262

Open Scripter17 opened 3 weeks ago

Scripter17 commented 3 weeks ago

While arching cohost I found that some of the posts try to save files with, for example, newlines in the name

This doesn't work as they're not escaped by the default extractor.cohost.filename config value

I think it's just a bad idea to put post text in the file name at all. It's just begging for weird bugs involving path lengths (one of the files was a bunch of :eyes: and it was too long to save) and invalid file names

Fixing this is a breaking change but it really should be fixed

Though to prevent redownloading entire accounts https://github.com/mikf/gallery-dl/issues/1673 would need to be implemented for people like me who still don't use an archive file for some reason. Still a breaking change but that would make it better

mikf commented 3 weeks ago

newlines in the name This doesn't work as they're not escaped

But they are: https://gdl-org.github.io/docs/configuration.html#extractor-path-remove ASCII control characters, including newlines, do get removed by default.

It's just begging for weird bugs involving path lengths

The length is restricted to 100 characters, but I guess it should have been bytes instead. https://github.com/mikf/gallery-dl/blob/f3f27496d6b084bb4defd94a2df34ec30fc9434c/gallery_dl/extractor/cohost.py#L22

mikf commented 3 weeks ago

Also, --rename might be useful here:

  --rename FORMAT             Rename previously downloaded files from FORMAT
                              to the current filename format