jpd236 / CrosswordScraper

Browser extension which downloads crosswords from crossword applets for offline solving.
Apache License 2.0
28 stars 1 forks source link

feature request: support for https://squares.io/ #8

Closed arelkin closed 1 year ago

arelkin commented 2 years ago

Take a look at a few puzzles and the page source. There is an object right on the page with puzzle info. You could just read the object and create puzzle: https://squares.io/s/7nysz3u4

jpd236 commented 2 years ago

Seems like there's data embedded in ipuz format (search source for "window.bootstrap_data", though for some reason this variable doesn't seem to be populated by the time the page is fully loaded). I haven't written a parser yet for ipuz so that would be the first step here.

One question here would be just how useful/important this is. I know people upload their own puzzles to squares.io to solve collaboratively, but then a scraper wouldn't be that important since you would have needed the puzzle in the first place. Are there puzzle authors/sources which are hosted primarily out of squares.io?

arelkin commented 2 years ago

One question here would be just how useful/important this is.

Not any more important than any other site. I just thought maybe you were looking to increase the formats that could be detected.

solve collaboratively, but then a scraper wouldn't be that important since you would have needed the puzzle in the first place.

I haven't collaborated with anyone, I merely solve puzzles on this site, just like any other site.

jpd236 commented 1 year ago

Now that ipuz parsing support is implemented, I took another look at this. I do think it'd be reasonably easy to scrape a puzzle when you open a URL directly, since the initial "bootstrap" puzzle data is embedded in a script right on the page. However, it looks significantly more difficult to do so if you navigate to another puzzle, or if you start at the home page and navigate to a puzzle from there. The site works by using messaging over WebSockets, and each socket requires a log-in before it will return puzzle data, so we'd either have to reimplement the entire log-in protocol in the extension to effectively emulate the site from the extension, or would need to intercept the WebSocket traffic in the background which would be a very different (and more privacy-sensitive) operating model from the current on-demand reading of site data.

Given the difficulty here, I think I'm going to close this as infeasible for now. I think the better approach would be to ask squares.io to implement some sort of puzzle export option to allow folks to continue solving offline, if they're willing.