hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.62k stars 508 forks source link

Foodnetwork.com no longer scraping #1010

Closed dips83 closed 4 months ago

dips83 commented 4 months ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

...

The results you expect to see

...

The results (including any Python error messages) that you are seeing

Getting null returned

jlucaspains commented 4 months ago

I did a bit of debugging as this is also affecting my apps and found that foodnetwork.com is rejecting requests with the default HEADERS the library is sending out:

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"
}
...
resp = requests.get(
                url,
                headers=HEADERS,
                proxies=proxies,
                timeout=timeout,
            )

Note that when the default User-Agent header is removed which lets Python define it as python-requests/2.31.0 it works. I also tried an updated user-agent and it also worked (Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/123.0.1).

Is it reasonable to update the user-agent to something newer? If so, I can submit a PR soon...

jlucaspains commented 4 months ago

I may have had an old version of the package on my local. While the problem with the user-agent seems relevant, it seems like a recent change changed the foodnetwork.com URL to foodnetwork.co.uk. Here is the commit:

https://github.com/hhursev/recipe-scrapers/commit/4fef338670142c3b4563f902ded1ade1f8001d0f#diff-5d1c0cbbdbecea7561f2fa87a9ecdd8bb896262f71727989caa4cf140b54cdeb

image

Am I missing something here?