hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.74k stars 531 forks source link

bestrecipes.com.au scraper broken #1354

Open Surfoo opened 2 weeks ago

Surfoo commented 2 weeks ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

The results you expect to see I don't know.

The results (including any Python error messages) that you are seeing I didn't run the scraper, I have an issue:

$ python -m pipx install recipe-scrapers --include-deps                                                                                                                                                                
'recipe-scrapers' already seems to be installed. Not modifying existing installation in '/home/johndoe/.local/pipx/venvs/recipe-scrapers'. Pass '--force' to force installation.

$ python                                                                                                                                                                                                               
Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from recipe_scrapers import scrape_html
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'recipe_scrapers'
>>> 
jknndy commented 2 weeks ago

Hi @Surfoo, it looks like the issue you’re experiencing is related to importing the recipe_scrapers library rather than the specific URLs. The ModuleNotFoundError: No module named 'recipe_scrapers' error suggests that Python wasn’t able to locate recipe_scrapers at all before attempting to access bestrecipes.

Could you share any additional output or error messages, if available, that might clarify the environment setup? It may also help to check if recipe_scrapers is installed in the same environment where you’re running the script.

Surfoo commented 2 weeks ago

I tried to help by following the Getting Started part in the readme. Which command would you like me to execute? I don't know Python.

I had the bug with the mealie app, here the log, but mealie use recipe_scrapers in backend.

mealie            | INFO     2024-11-03T19:03:47 - HTTP Request: GET https://www.bestrecipes.com.au/recipes/mini-marsala-fruit-cakes-recipe/kwlyzyae "HTTP/1.1 403 Forbidden"
jknndy commented 2 weeks ago

Sorry for the delay here, I am traveling for work so free time is rare! The 403 error makes me believe this could be related to the way mealie is attempting to access the site.

@jayaddison could you weigh in here? I believe this is similar to the other bots-protection issue opened recently.

jayaddison commented 2 weeks ago

Initially: yes, it seems likely that this could be some form of bot protection (network request filtering). I'll try to confirm that soon. @Surfoo did you manage to find a way to get that import to work? We don't generally suggest using pip here, but pipx should work equally well I'd expect.

Surfoo commented 2 weeks ago

Hello, No I haven't tried it since the last time

jayaddison commented 1 week ago

I can confirm that I'm able to scrape the first recipe (the Satay Chicken one) from HTML successfully, so this does indeed seem to be some kind of network-request-filtering problem (aka bots-protection).