hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.68k stars 519 forks source link

hellofresh.de returns bug #747

Open sHooPmyWooP opened 1 year ago

sHooPmyWooP commented 1 year ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

The results you expect to see

...

The results (including any Python error messages) that you are seeing

scraper = scrape_me("above url")
sraper.title() 

raises this error

s.title()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipe_scrapers\plugins\exception_handling.py", line 64, in decorated_method_wrapper
    return decorated(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipe_scrapers\plugins\html_tags_stripper.py", line 75, in decorated_method_wrapper
    decorated_func_result = decorated(self, *args, **kwargs)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipe_scrapers\plugins\normalize_string.py", line 34, in decorated_method_wrapper
    return normalize_string(decorated(self, *args, **kwargs))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipe_scrapers\plugins\schemaorg_fill.py", line 48, in decorated_method_wrapper
    return decorated(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipe_scrapers\hellofresh.py", line 11, in title
    return self.schema.title()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipe_scrapers\_schemaorg.py", line 80, in title
    return normalize_string(self.data.get("name"))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipe_scrapers\_utils.py", line 141, in normalize_string
    unescaped_string = html.unescape(string)
                       ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\html\__init__.py", line 130, in unescape
    if '&' not in s:
       ^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable
sHooPmyWooP commented 1 year ago

If someone with more experience could confirm, that this has to be fixed by parsing the html, as opposed to fetching the schema, I would happily give it a try following this approach https://github.com/hhursev/recipe-scrapers/commit/ffee963d04.