jpd236 / CrosswordScraper

Browser extension which downloads crosswords from crossword applets for offline solving.
Apache License 2.0
28 stars 1 forks source link

Scrape error on WSJ acrostic with missing byline #38

Closed oeuftete closed 8 months ago

oeuftete commented 8 months ago

The previous WSJ acrostic (with a byline, by Mike Shenk) still works.

Generated at: 2023-10-10T11:08:55.360Z
Extension version: 1.3.8
Browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
URL: https://www.wsj.com/articles/the-journal-acrostic-saturday-variety-puzzle-october-7-77de57ef
Scraped Puzzles:
Scrape exception: source = Wall Street Journal
-------
pr: Field 'byline' is required for type with serial name 'com.jeffpdavidson.kotwords.formats.json.WallStreetJournalJson.Copy', but it was missing at path: $.copy at path: $.copy
    at it.br (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1618159)
    at it.w17 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1603153)
    at qg.q49 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2464604)
    at qg.c3s (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2465676)
    at qg.x8 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2455462)
    at qg.f3s (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2456906)
    at qg.x8 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2452071)
    at qg.b3s (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2454495)
    at ht.x8 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1171725)
    at Vc.a9 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1397945)
Caused by: pr: Field 'byline' is required for type with serial name 'com.jeffpdavidson.kotwords.formats.json.WallStreetJournalJson.Copy', but it was missing at path: $.copy
    at it.br (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1618159)
    at ct.ar (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1533571)
    at ct.or (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1534104)
    at it.or (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1619038)
    at qg.un (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2494631)
    at it.br (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1617908)
    at it.w17 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1603153)
    at qg.q49 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2464604)
    at qg.c3s (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2465676)
    at qg.x8 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2455462)
Caused by: pr: Field 'byline' is required for type with serial name 'com.jeffpdavidson.kotwords.formats.json.WallStreetJournalJson.Copy', but it was missing
    at t.$_$.m3 (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1580970)
    at chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1764224
    at jA (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1764392)
    at qg.un (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2491078)
    at it.br (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1617908)
    at ct.ar (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1533571)
    at ct.or (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1534104)
    at it.or (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1619038)
    at qg.un (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:2494631)
    at it.br (chrome-extension://lmneijnoafbpnfdjabialjehgohpmcpo/js/CrosswordScraper.js:2:1617908)
jpd236 commented 8 months ago

Thanks for the detailed report and taking a pass at fixing the issue! Much appreciated.

Looking at the underlying data, it looks like there is still a byline there - it's just moved from copy.byline to meta.byline. So while it makes some sense to have some resiliency in case of a truly missing byline, I think the primary fix here will be to use meta.byline if it's present, and only fall back to an empty byline if both fields are missing. I can make that change for both the acrostic and regular puzzles just in case, even if we've only seen it on the acrostic thus far.