mealie-recipes / mealie

Mealie is a self hosted recipe manager and meal planner with a RestAPI backend and a reactive frontend application built in Vue for a pleasant user experience for the whole family. Easily add recipes into your database by providing the url and mealie will automatically import the relevant data or add a family recipe with the UI editor
https://docs.mealie.io
GNU Affero General Public License v3.0
7.32k stars 732 forks source link

[SCRAPER] - America's Test Kitchen random scrape failures #3964

Closed PanavisionT16 closed 3 months ago

PanavisionT16 commented 3 months ago

First Check

Please provide 1-5 example URLs that are having errors

https://www.americastestkitchen.com/recipes/13622-crispy-fried-shrimp https://www.americastestkitchen.com/recipes/7550-classic-chicken-salad https://www.americastestkitchen.com/recipes/10330-foolproof-all-butter-dough-for-double-crust-pie https://www.americastestkitchen.com/recipes/8564-best-ground-beef-chili https://www.americastestkitchen.com/recipes/8819-backyard-barbecue-beans

Please provide your logs for the Mealie container docker logs <container-id> > mealie.logs

[ERROR|httptools_impl|L404] 2024-07-31T06:06:57: Exception in ASGI application Traceback (most recent call last): File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 24, in call await responder(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 44, in call await self.app(scope, receive, self.send_with_gzip) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/app/mealie/routes/_base/routers.py", line 35, in custom_route_handler response = await original_route_handler(request) File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/app/mealie/routes/recipe/recipe_crud_routes.py", line 203, in parse_recipe_url recipe, extras = await create_from_url(req.url, self.translator) File "/app/mealie/services/scraper/scraper.py", line 34, in create_from_url new_recipe, extras = await scraper.scrape(url) File "/app/mealie/services/scraper/recipe_scraper.py", line 43, in scrape result = await scraper.parse() File "/app/mealie/services/scraper/scraper_strategies.py", line 236, in parse return self.clean_scraper(scraped_data, self.url) File "/app/mealie/services/scraper/scraper_strategies.py", line 189, in clean_scraper recipe_instructions=get_instructions(), File "/app/mealie/services/scraper/scraper_strategies.py", line 159, in get_instructions instruction_as_text = cleaner.clean_instructions(instruction_as_text) File "/app/mealie/services/scraper/cleaner.py", line 206, in clean_instructions return clean_instructions( File "/app/mealie/services/scraper/cleaner.py", line 149, in clean_instructions return [ File "/app/mealie/services/scraper/cleaner.py", line 152, in if instruction["text"].strip() KeyError: 'text' [INFO|httptools_impl|L466] 2024-07-31T06:07:01: 127.0.0.1:36928 - "GET /api/app/about HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:15: 172.16.112.21:50946 - "GET /api/organizers/categories?page=1&perPage=-1&orderBy=name&orderDirection=asc HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:15: 172.16.112.21:50948 - "GET /api/organizers/tools?page=1&perPage=-1&orderBy=name&orderDirection=asc HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50947 - "GET /api/recipes?page=1&perPage=64&orderBy=created_at&orderDirection=desc&paginationSeed=1722406035797&searchSeed=1722406035797&search=&requireAllCategories=false&requireAllTags=false&requireAllTools=false&requireAllFoods=false HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50947 - "GET /api/users/self/ratings HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50946 - "GET /api/media/recipes/2e97b771-bdfe-4498-a91e-f02c77d16105/images/min-original.webp?rnd=1&version=dzYo HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50951 - "GET /api/media/recipes/a923213c-3ec2-4bbf-bca1-895c5d228668/images/min-original.webp?rnd=1&version=1GsX HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50950 - "GET /api/media/recipes/e45035f4-ade7-4017-aaf4-63738de4dd7a/images/min-original.webp?rnd=1&version=Tnj6 HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50949 - "GET /api/media/recipes/2dba999e-75c8-4085-8258-e26f8488d778/images/min-original.webp?rnd=1&version=jFUh HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50948 - "GET /api/media/recipes/d209f701-1ddb-4ca0-bc78-7d68f51356c5/images/min-original.webp?rnd=1&version=3FBA HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50946 - "GET /api/media/recipes/6df07ef5-925b-4d83-8a84-2233bedab93a/images/min-original.webp?rnd=1&version=pEXu HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50947 - "GET /_nuxt/fonts/Roboto-500-latin-ext27.9165081.woff2 HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:16: 172.16.112.21:50951 - "GET /api/recipes?page=3&perPage=32&orderBy=created_at&orderDirection=desc&paginationSeed=1722406035797&searchSeed=1722406035797&search=&requireAllCategories=false&requireAllTags=false&requireAllTools=false&requireAllFoods=false HTTP/1.1" 200 [INFO|httptools_impl|L466] 2024-07-31T06:07:25: 172.16.112.21:50959 - "GET /api/media/recipes/ee2b3d03-d721-4976-9e24-39b594997289/images/min-original.webp?rnd=1&version= HTTP/1.1" 404 [INFO|httptools_impl|L466] 2024-07-31T06:07:31: 127.0.0.1:46378 - "GET /api/app/about HTTP/1.1" 200 [INFO|_client|L1773] 2024-07-31T06:07:50: HTTP Request: GET https://www.americastestkitchen.com/recipes/13622-crispy-fried-shrimp "HTTP/1.1 200 OK" [ERROR|scraper_strategies|L137] 2024-07-31T06:07:50: Error parsing recipe func_call for 'recipeInstructions' [INFO|httptools_impl|L466] 2024-07-31T06:07:50: 172.16.112.21:50982 - "POST /api/recipes/create-url HTTP/1.1" 500

[ERROR|httptools_impl|L404] 2024-07-31T06:06:35: Exception in ASGI application Traceback (most recent call last): File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 24, in call await responder(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 44, in call await self.app(scope, receive, self.send_with_gzip) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/app/mealie/routes/_base/routers.py", line 35, in custom_route_handler response = await original_route_handler(request) File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/app/mealie/routes/recipe/recipe_crud_routes.py", line 203, in parse_recipe_url recipe, extras = await create_from_url(req.url, self.translator) File "/app/mealie/services/scraper/scraper.py", line 34, in create_from_url new_recipe, extras = await scraper.scrape(url) File "/app/mealie/services/scraper/recipe_scraper.py", line 43, in scrape result = await scraper.parse() File "/app/mealie/services/scraper/scraper_strategies.py", line 236, in parse return self.clean_scraper(scraped_data, self.url) File "/app/mealie/services/scraper/scraper_strategies.py", line 189, in clean_scraper recipe_instructions=get_instructions(), File "/app/mealie/services/scraper/scraper_strategies.py", line 159, in get_instructions instruction_as_text = cleaner.clean_instructions(instruction_as_text) File "/app/mealie/services/scraper/cleaner.py", line 206, in clean_instructions return clean_instructions( File "/app/mealie/services/scraper/cleaner.py", line 149, in clean_instructions return [ File "/app/mealie/services/scraper/cleaner.py", line 152, in if instruction["text"].strip() KeyError: 'text' [INFO|_client|L1773] 2024-07-31T06:06:46: HTTP Request: GET https://www.americastestkitchen.com/recipes/7550-classic-chicken-salad "HTTP/1.1 200 OK" [ERROR|scraper_strategies|L137] 2024-07-31T06:06:46: Error parsing recipe func_call for 'recipeInstructions' [INFO|httptools_impl|L466] 2024-07-31T06:06:46: 172.16.112.21:50904 - "POST /api/recipes/create-url HTTP/1.1" 500 [ERROR|httptools_impl|L404] 2024-07-31T06:06:46: Exception in ASGI application Traceback (most recent call last): File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 24, in call await responder(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 44, in call await self.app(scope, receive, self.send_with_gzip) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/app/mealie/routes/_base/routers.py", line 35, in custom_route_handler response = await original_route_handler(request) File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/app/mealie/routes/recipe/recipe_crud_routes.py", line 203, in parse_recipe_url recipe, extras = await create_from_url(req.url, self.translator) File "/app/mealie/services/scraper/scraper.py", line 34, in create_from_url new_recipe, extras = await scraper.scrape(url) File "/app/mealie/services/scraper/recipe_scraper.py", line 43, in scrape result = await scraper.parse() File "/app/mealie/services/scraper/scraper_strategies.py", line 236, in parse return self.clean_scraper(scraped_data, self.url) File "/app/mealie/services/scraper/scraper_strategies.py", line 189, in clean_scraper recipe_instructions=get_instructions(), File "/app/mealie/services/scraper/scraper_strategies.py", line 159, in get_instructions instruction_as_text = cleaner.clean_instructions(instruction_as_text) File "/app/mealie/services/scraper/cleaner.py", line 206, in clean_instructions return clean_instructions( File "/app/mealie/services/scraper/cleaner.py", line 149, in clean_instructions return [ File "/app/mealie/services/scraper/cleaner.py", line 152, in if instruction["text"].strip() KeyError: 'text' [INFO|_client|L1773] 2024-07-31T06:06:57: HTTP Request: GET https://www.americastestkitchen.com/recipes/13622-crispy-fried-shrimp "HTTP/1.1 200 OK" [ERROR|scraper_strategies|L137] 2024-07-31T06:06:57: Error parsing recipe func_call for 'recipeInstructions' [INFO|httptools_impl|L466] 2024-07-31T06:06:57: 172.16.112.21:50921 - "POST /api/recipes/create-url HTTP/1.1" 500

Deployment

Docker (Linux)

PanavisionT16 commented 3 months ago

It seems to be random as I can pull many other recipes off their website without issue.

michael-genson commented 3 months ago

Looks like a bug with the cleaner:

if instruction["text"].strip()
KeyError: 'text'