jpd236 / CrosswordScraper

Browser extension which downloads crosswords from crossword applets for offline solving.
Apache License 2.0
28 stars 1 forks source link

Does Scraper support PZZL format? #35

Closed arelkin closed 9 months ago

arelkin commented 10 months ago

Does the script detect puzzles on page by format or does it also need to take into account the domain where the puzzle is?

jpd236 commented 10 months ago

Do you have example puzzle(s) you could link to?

The general answer depends on the applet type. In most cases, we don't need to know the specific domain embedding a puzzle in advance. But in some cases, we need to know the domain in advance in order to keep the permissions the applet requests reasonable. The detailed version is that when you open the extension, we only get automatic access to the current page, but not any embedded pages within that page, so we need to prompt for permissions to access any embedded pages. We only want to do so if we're reasonably confident that there's going to be a crossword there. This works fine in most cases, since there's typically all puzzle applets of a certain type tend to be hosted on the same top-level domain. But in a few cases, the applet code is copied to different individual servers, in which case it's difficult to know just from the URL whether an embedded page is likely to have a crossword or not. In those cases, we use a fixed list of known domains.

In this specific case, I've never heard of PZZL, so odds are it's something that needs new support, rather than just needing a URL to be enabled for one of the existing parsers.

arelkin commented 10 months ago

Here's one example, The Seattle Times https://www.seattletimes.com/games-nytimes-crossword/

jpd236 commented 10 months ago

Thanks. I have come across that format before with a different source for the Newsday puzzle (https://www.brainsonly.com/global/newsday/cwd/#/s/230905). Might have even written a parser but it's not currently included in Kotwords. There's one for XWord at https://github.com/mrichards42/xword/blob/master/scripts/import/newsday.lua. Interestingly, the NYT puzzle at https://nytsyn.pzzl.com/nytsyn-crossword-mh/nytsyncrossword?date=230905 demonstrates support for circled squares (by preceding them with "%" in the grid) which I don't think I'd seen before.

It's feasible to support this but would take some work for the new format. I'm open to it as I'm not aware of another digital source for the NYT syndicated puzzles that is supported today.

jpd236 commented 9 months ago

The syndicated NYT and Newsday puzzles should both work in the next scraper release, once I get around to it.

For now only these two will work, since each PZZL applet is hosted on a separate/custom domain and has a unique URL for downloading puzzle data. Let me know if you come across any other sites using this applet.

arelkin commented 9 months ago

This is great news. Thank you.

The syndicated NYT

That means you are talking about the Seattle Times?

jpd236 commented 9 months ago

Yes, the Seattle Times is just running the New York Times puzzle from about a month in the past (which is what they syndicate).