Closed dips83 closed 4 months ago
I did a bit of debugging as this is also affecting my apps and found that foodnetwork.com is rejecting requests with the default HEADERS the library is sending out:
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"
}
...
resp = requests.get(
url,
headers=HEADERS,
proxies=proxies,
timeout=timeout,
)
Note that when the default User-Agent header is removed which lets Python define it as python-requests/2.31.0
it works. I also tried an updated user-agent and it also worked (Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/123.0.1
).
Is it reasonable to update the user-agent to something newer? If so, I can submit a PR soon...
I may have had an old version of the package on my local. While the problem with the user-agent seems relevant, it seems like a recent change changed the foodnetwork.com URL to foodnetwork.co.uk. Here is the commit:
Am I missing something here?
Pre-filing checks
The URL of the recipe(s) that are not being scraped correctly
...
The results you expect to see
...
The results (including any Python error messages) that you are seeing
Getting null returned