hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.61k stars 505 forks source link

Unable to silence exception on "cuisine" method #1020

Open fradeve opened 3 months ago

fradeve commented 3 months ago

(First and foremost, thanks for bringing us such a great piece of software!)

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

The results you expect to see

A functional scrape, without exceptions.

The results (including any Python error messages) that you are seeing

I am trying to run a "wild mode" scraper on an unsupported website (ocado.com/webshop/recipe) and I just want to silence exceptions when they happen. I went through the open and close Issues, and found out that I can suppress exceptions. So I did t his:

scraper_settings.py

SUPPRESS_EXCEPTIONS = True
ON_EXCEPTION_RETURN_VALUES = {
    "title": None,
    "total_time": None,
    "yields": None,
    "image": None,
    "ingredients": None,
    "instructions": None,
    "instructions_list": None,
    "ratings": None,
    "reviews": None,
    "links": None,
    "language": None,
    "nutrients": None,
    "cuisine": None,
}

Then I used the settings like this:

import os
os.environ['RECIPE_SCRAPERS_SETTINGS'] = 'mealplanner.management.commands.scraper_settings'
from recipe_scrapers import scrape_me
url = 'https://ocado.com/webshop/recipe/spring-minestrone-with-kale-oil-/250826'
scraper = scrape_me(url, wild_mode=True)

scraper.cuisine()

And I get the following:

Traceback (most recent call last):
  File "/home/fradeve/git/mealplanner/src/./manage.py", line 21, in <module>
    main()
  File "/home/fradeve/git/mealplanner/src/./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/home/fradeve/.pyenv/versions/mealplanner/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/home/fradeve/.pyenv/versions/mealplanner/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/fradeve/.pyenv/versions/mealplanner/lib/python3.12/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/fradeve/.pyenv/versions/mealplanner/lib/python3.12/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fradeve/git/mealplanner/src/mealplanner/management/commands/populate_db.py", line 47, in handle
    cuisine=scraper.cuisine(),
            ^^^^^^^^^^^^^^^^^
  File "/home/fradeve/.pyenv/versions/mealplanner/lib/python3.12/site-packages/recipe_scrapers/_factory.py", line 45, in cuisine
    return self.schema.cuisine()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/fradeve/.pyenv/versions/mealplanner/lib/python3.12/site-packages/recipe_scrapers/_schemaorg.py", line 292, in cuisine
    raise SchemaOrgException("No cuisine data in SchemaOrg.")
recipe_scrapers._exceptions.SchemaOrgException: recipe-scrapers exception: No cuisine data in SchemaOrg.
jayaddison commented 3 months ago

Thanks for the bugreport @fradeve! I think this is because cuisine isn't one of the methods that the exception_handling plugin handles; it will suppress exceptions for certain fields, but cuisine isn't one of them.

We might need to do some digging into the code history to figure out why that field isn't included -- the field list isn't configurable at the moment as far as I can tell.