thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
139 stars 30 forks source link

Centralize encoding/markup sanitization #148

Closed thisisparker closed 6 months ago

thisisparker commented 7 months ago

In fixing #146 I discovered that I'd previously caused a regression by moving a step out of the latin-1 conversion flow for New Yorker puzzles, and in untangling that I'm seeing now how it probably would be better to centralize all of that to be done at the save stage. That would also facilitate an easier solution to #89, because it could be a single switch there.

The one hitch would be that the puzzle objects produced before the final step would not be legal puz files. I think that's okay, but I want to think about how to make this package's save() more accessible.

(Incidentally, the real answer to that question is to make no guarantees that the intermediate puzzle objects are puz-legal at all, and instead convert to whatever format as a last step, but that's a bigger change.)