Closed esaumell closed 9 months ago
Same issue here
From what I can see the problem is not with the offending line. The problem is with the URL. It gives no results, so there is no match within the regex, and then, the code complains about it with no index because we have a missing conditional here to cover this cases.
This output:
2023-10-05 00:42:29.108 | INFO | bookingcom:scrape_search:72 - scraping search for Malta 2023-10-05-2023-10-12
2023-10-05 00:42:32.488 | DEBUG | bookingcom:parse_search_page:41 - parsing search page: https://www.booking.com/index.html?label=gen173nr-1FCAQoggJCDHNlYXJjaF9tYWx0YUgzWARosAKIAQGYATG4ARjIAQzYAQHoAQH4AQOIAgGoAgS4AtbU96gGwAIB0gIkYmU3Yzk4NmMtMWIyNC00YWY2LTg1NWMtYzVhMmVkNjk5OTE42AIF4AIB&sid=ffea2ce6339ec3c5418636ac90057a05&srpvid=a3329fab25ff00f3&&errorc_searchstring_not_found=ss
Should be similar to this one:
2023-09-13 22:53:35.131 | INFO | bookingcom:scrape_search:72 - scraping search for Malta 2023-09-13-2023-09-20
2023-09-13 22:53:38.864 | DEBUG | bookingcom:parse_search_page:41 - parsing search page: https://www.booking.com/searchresults.html?ss=Malta&checkin_year=2023&checkin_month=09&checkin_monthday=13&checkout_year=2023&checkout_month=09&checkout_monthday=20&no_rooms=1&offset=0
Tell me if I'm wrong but this is stock code from GH and for me this is caused by some change on Scrapfly's API
Thanks for the detailed report @esaumell This seems to be caused by an update of how Booking is generating URLs for search. Now they require location id together with the search string and some url parameters have changed for checkin/checkout. So, the scraper couldn't find any results.
I've updated the search url generation and for the details see this commit: https://github.com/scrapfly/scrapfly-scrapers/commit/72def4300d21b1eb8128aa76899d3e1e2b822b9a
Cheers!
That last sentence of my last post should have ended like ...or booking.com's code Thank you so much @Granitosaurus
Best!
Scraper Which scraper is affected? bookingcom-scraper Environment Python 3.10.12 Scrapfly SDK version: Version: 0.8.8 Operating System: Ubuntu 22.04.3 LTS Describe the bug On a working environment suddenly we get an
IndexError: list index out of range
error. To reproduce:Received Output
Expected Output No errors Screenshots Not needed Additional context On September 21 it was working. It hasn't worked since then.