hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.72k stars 525 forks source link

[Americas Test Kitchen] - Scraper broken #1317

Open sudobash1 opened 3 days ago

sudobash1 commented 3 days ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

Seems like all recipes broken. Here are some examples:

The results you expect to see

It used to work (72 hours ago). Now it doesn't find any recipe.

The results (including any Python error messages) that you are seeing

Python error:

ERROR    2024-10-21T17:05:05 - Exception in ASGI application
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 24, in __call__
    await responder(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 44, in __call__
    await self.app(scope, receive, self.send_with_gzip)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/app/mealie/routes/_base/routers.py", line 35, in custom_route_handler
    response = await original_route_handler(request)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/app/mealie/routes/recipe/recipe_crud_routes.py", line 203, in parse_recipe_url
    recipe, extras = await create_from_url(req.url, self.translator)
  File "/app/mealie/services/scraper/scraper.py", line 34, in create_from_url
    new_recipe, extras = await scraper.scrape(url)
  File "/app/mealie/services/scraper/recipe_scraper.py", line 43, in scrape
    result = await scraper.parse()
  File "/app/mealie/services/scraper/scraper_strategies.py", line 233, in parse
    scraped_data = await self.scrape_url()
  File "/app/mealie/services/scraper/scraper_strategies.py", line 204, in scrape_url
    scraped_schema = scrape_html(recipe_html, org_url=self.url, supported_only=False)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/recipe_scrapers/__init__.py", line 844, in scrape_html
    return SCRAPERS[host_name](html=html, url=org_url)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/recipe_scrapers/_abstract.py", line 33, in __init__
    for name, _ in inspect.getmembers(self, inspect.ismethod):
  File "/usr/local/lib/python3.10/inspect.py", line 469, in getmembers
    value = getattr(object, key)
  File "/usr/local/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/recipe_scrapers/americastestkitchen.py", line 98, in _get_additional_details
    name = list(j["props"]["initialState"]["content"]["documents"])[0]
IndexError: list index out of range

It looks like America's Test Kitchen changed the json on the page.

sudobash1 commented 3 days ago

I am using this patch in my docker container, and it seems to work like it did before.

diff --git a/recipe_scrapers/americastestkitchen.py b/recipe_scrapers/americastestkitchen.py
index 589b9c95..08fdc68f 100644
--- a/recipe_scrapers/americastestkitchen.py
+++ b/recipe_scrapers/americastestkitchen.py
@@ -71,5 +71,4 @@ class AmericasTestKitchen(AbstractScraper):
     @functools.cached_property
     def _get_additional_details(self):
         j = json.loads(self.soup.find(type="application/json").string)
-        name = list(j["props"]["initialState"]["content"]["documents"])[0]
-        return j["props"]["initialState"]["content"]["documents"][name]
+        return j["props"]["pageProps"]["data"]