thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
140 stars 30 forks source link

v2023.4.3: newyorkerdownloader: Cannot find puzzle at ... #109

Closed edsantiago closed 1 year ago

edsantiago commented 1 year ago

As of June 28:

Cannot find puzzle at https://www.newyorker.com/puzzles-and-games-dept/crossword/2023/06/29.

This is pprint.pprint(json_data). Although articleBody is present, it is an empty string:

{'@context': 'http://schema.org',
 '@type': 'NewsArticle',
 'alternativeHeadline': 'A free online puzzle published every weekday, with '
                        'difficulty levels ranging from easy to hard, and '
                        'answers and clues that exhibit the wit and '
                        'intelligence of the magazine.',
 'articleBody': '',
 'articleSection': 'crossword',
 'author': [{'@type': 'Person',
             'name': 'Caitlin Reid',
             'sameAs': 'https://www.newyorker.com/contributors/caitlin-reid'}],
 'dateModified': '2023-06-29T06:00:00.000-04:00',
 'datePublished': '2023-06-29T06:00:00.000-04:00',
 'description': 'A free online puzzle published every weekday, with difficulty '
                'levels ranging from easy to hard, and answers and clues that '
                'exhibit the wit and intelligence of the magazine.',
 'headline': 'The Crossword: Thursday, June 29, 2023',
 'image': ['https://media.newyorker.com/photos/623b77fe3d162b0f59264a13/16:9/w_1920,h_1080,c_limit/NewYorkerCrossword-Thursday.jpg',
           'https://media.newyorker.com/photos/623b77fe3d162b0f59264a13/4:3/w_1440,h_1080,c_limit/NewYorkerCrossword-Thursday.jpg',
           'https://media.newyorker.com/photos/623b77fe3d162b0f59264a13/1:1/w_1080,h_1080,c_limit/NewYorkerCrossword-Thursday.jpg'],
 'isAccessibleForFree': True,
 'isBasedOn': 'https://www.newyorker.com/puzzles-and-games-dept/crossword/2023/06/29',
 'isPartOf': {'@type': 'CreativeWork', 'name': 'The New Yorker'},
 'keywords': ['crossword',
              'puzzles',
              'thursday crossword',
              'puzzles & games',
              'textabovecentersmallwithrule',
              'onecolumn',
              'web'],
 'mainEntityOfPage': {'@id': 'https://www.newyorker.com/puzzles-and-games-dept/crossword/2023/06/29',
                      '@type': 'WebPage'},
 'publisher': {'@context': 'https://schema.org',
               '@type': 'Organization',
               'logo': {'@type': 'ImageObject',
                        'height': '117px',
                        'url': 'https://www.newyorker.com/verso/static/the-new-yorker/assets/logo-seo.png',
                        'width': '500px'},
               'name': 'The New Yorker',
               'url': 'https://www.newyorker.com'},
 'thumbnailUrl': 'https://media.newyorker.com/photos/623b77fe3d162b0f59264a13/3:2/w_1620,h_1080,c_limit/NewYorkerCrossword-Thursday.jpg',
 'url': 'https://www.newyorker.com/puzzles-and-games-dept/crossword/2023/06/29'}

...and that's about as far as I can take it tonight, without having a sense for what articleBody is supposed to be... sorry.

thisisparker commented 1 year ago

Thank you, this is super helpful! I will attempt to finish debugging and get a release out ASAP.

thisisparker commented 1 year ago

I think that PR fixes the issue! It just uses a different approach to finding the puzzle. That ld+json approach was good while it lasted.

Feedback welcome on the fix and I'll likely be merging tomorrow or soon.