jpd236 / CrosswordScraper

Browser extension which downloads crosswords from crossword applets for offline solving.
Apache License 2.0
28 stars 1 forks source link

Date convention seems to be off #9

Closed arelkin closed 2 years ago

arelkin commented 2 years ago

https://www.newyorker.com/puzzles-and-games-dept/crossword/2022/01/24 file name comes back as: PatrickBerry-TheCrosswordMondayJanuary242022 242022 ? Date should be 220124

https://www.theepochtimes.com/tuesday-january-25-2022-epoch-crossword_4232464.html TomHouston-Jan252022 Should be 220125

This appears to be happening on all formats: puz, jpz and pdf.

Not sure if only happening on Amuselabs, or other places, too. It's just where I noticed it happening.

jpd236 commented 2 years ago

We don't actually (reliably) know the date of any of these puzzles. All we do is take the author and title of the puzzle and remove the spaces. So "242022" is just "24 2022" (as in "January 24 2022") without spaces. These outlets include the date in their title, so it ends up looking like this.

Doesn't feel worth trying to get fancier about detecting particular elements of titles, and IMHO it looks worse to have a separator (and I think it's preferable to avoid spaces in file names). So I think I will leave this as is for now, but if you feel strongly about another approach, let me know.