Feature request: filename specifications

clorgie commented 5 months ago

Thanks for this great tool! One thing that would be great: the ability to specify the filename in some way. In particular, I like to add the author(s) to the resulting filename. Not sure if that is possible technically given the nature of the extension, but figured I would propose the idea :)

jpd236 commented 5 months ago

Authors are already included if they can be inferred from the puzzle's metadata. The https://github.com/jpd236/CrosswordScraper/blob/main/src/jsMain/kotlin/com/jeffpdavidson/crosswordscraper/CrosswordScraper.kt#L336C9-L337C82 are:

// In descending priority, author-title, title, author, scraping source.
// Each word is capitalized, and non-alphanumeric characters are removed.

I'm guessing this is on an indie blog or some other source where the author is known through some other context that's not present in the puzzle data itself. (But if you have an example, that would help confirm).

So the only way to do this would be to provide a way to change the filename for each download. In that regard, it's arguably more of a browser setting question. Chrome, for example, has a setting at chrome://settings/downloads (Settings -> Downloads), "Ask where to save each file before downloading". If you enable that setting, then every time you click a CrosswordScraper link - or any other download link - you'll get a save dialog allowing you to adjust the filename. It feels to me like it would be a bit cludgy to build something custom just for CrosswordScraper here.

Would that setting meet your needs? Or do you think we're actually missing the author from some inferable metadata?

clorgie commented 5 months ago

Ah, I see. I was thinking about a) adding the date and b) changing the order of some of the elements.

It looks like date isn't in the metadata anyway (which makes sense in retrospect!), so (a) is immaterial.

(b) is because I end up editing the filenames to put the source first in the filename.

jpd236 commented 5 months ago

Yeah, the problem is that both of those pieces of information are not consistently available, and when they are they're typically shoved into the title/author/copyright field and would need to be parsed out. Something like the source might be available for certain puzzles like the NYT, but for AmuseLabs puzzles, for example, there's nothing inherent in the data itself. We could do something like track a number of popular sites, but I don't love that from a maintenance perspective. I think your best bet here is probably just to change that setting, but maybe we could think about at least offering other options like Title-Author.

jpd236 / CrosswordScraper

Feature request: filename specifications #42