Foodnetwork.com no longer scraping

dips83 commented 4 months ago

Pre-filing checks

[x] I have searched for open issues that report the same problem
[x] I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

https://www.foodnetwork.com/recipes/alton-brown/shepherds-pie-recipe2-1942900

...

The results you expect to see

...

The results (including any Python error messages) that you are seeing

Getting null returned

jlucaspains commented 4 months ago

I did a bit of debugging as this is also affecting my apps and found that foodnetwork.com is rejecting requests with the default HEADERS the library is sending out:

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"
}
...
resp = requests.get(
                url,
                headers=HEADERS,
                proxies=proxies,
                timeout=timeout,
            )

Note that when the default User-Agent header is removed which lets Python define it as python-requests/2.31.0 it works. I also tried an updated user-agent and it also worked (Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/123.0.1).

Is it reasonable to update the user-agent to something newer? If so, I can submit a PR soon...

jlucaspains commented 4 months ago

I may have had an old version of the package on my local. While the problem with the user-agent seems relevant, it seems like a recent change changed the foodnetwork.com URL to foodnetwork.co.uk. Here is the commit:

https://github.com/hhursev/recipe-scrapers/commit/4fef338670142c3b4563f902ded1ade1f8001d0f#diff-5d1c0cbbdbecea7561f2fa87a9ecdd8bb896262f71727989caa4cf140b54cdeb

Am I missing something here?

hhursev / recipe-scrapers

Foodnetwork.com no longer scraping #1010