Closed robertmain closed 2 years ago
From what I understand of the logging output above, it seems like the app is not happy with the sudden workload of trying to parse 60-off HTML pages from different website concurrently(which is understandable).
I've had similar results trying this with smaller pieces of the data too. Is there a better way to do this?
Can you post your script? Atleast the part that interacts with Mealie?
Yes of course, sorry.
import { axiosInstance } from './api';
(async () => {
const urls = [
'http://fitcakery.com/home/2013/9/30/caprese-crepes',
'http://littlespicejar.com/one-pot-greek-chicken-and-rice-pilaf/',
'http://thetoastykitchen.com/spring-vegetable-soup/?utm_medium=social&utm_source=pinterest&utm_campaign=tailwind_tribes&utm_content=tribes&utm_term=638688072_24819740_11553',
'http://wholeandheavenlyoven.com/2015/04/02/springtime-veggie-pasta-primavera-2/',
'http://www.thepreservesproject.com/asparagus-lemon-risotto/',
'https://diethood.com/chicken-ratatouille-recipe/',
'https://healthyrecipesflatley.blogspot.com/2019/04/easy-vegetable-crustless-quiche-dairy.html',
'https://jenelizabethsjournals.com/2015/02/10/one-pot-pasta/',
'https://jenelizabethsjournals.com/2015/02/10/one-pot-pasta/',
'https://mycrazygoodlife.com/healthy-tuscan-chicken-pasta-instant-pot-slow-cooker-stovetop/',
'https://mycrazygoodlife.com/instant-pot-21-day-fix-burrito-bowl/',
'https://ohsweetbasil.com/quick-and-easy-lemon-orzo-with-parmesan-and-peas-recipe/',
'https://pinchofyum.com/creamy-chicken-quinoa-broccoli-casserole',
'https://pinchofyum.com/creamy-chicken-quinoa-broccoli-casserole',
'https://showmetheyummy.com/healthy-mexican-casserole/',
'https://slimsanity.com/healthy-broccoli-chicken-casserole/',
'https://thesaltymarshmallow.com/best-easy-instant-pot-chili/',
'https://www.acouplecooks.com/creamy-goat-cheese-pasta/',
'https://www.allrecipes.com/recipe/222137/healthier-ultimate-twice-baked-potatoes/',
'https://www.ambitiouskitchen.com/healthy-chicken-pot-pie/',
'https://www.ambitiouskitchen.com/roasted-butternut-squash-broccoli-cheddar-chicken-couscous/',
'https://www.bbc.com/food/recipes/basictomatoandbasils_67840',
'https://www.bhg.com/recipe/poultry/fast-chicken-fettuccine/',
'https://www.bonappetit.com/recipe/risotto-with-butternut-squash-leeks-and-basil',
'https://www.chelseasmessyapron.com/sweet-potato-burritos/',
'https://www.chelseasmessyapron.com/the-best-ever-chicken-fajita-bowls/',
'https://www.crazyforcrust.com/mexican-chicken-soup/',
'https://www.delish.com/cooking/recipe-ideas/recipes/a54291/one-pan-balsamic-chicken-and-asparagus-recipe/',
'https://www.dessertfortwo.com/instant-pot-chicken-tacos/',
'https://www.dessertfortwo.com/instant-pot-salsa-chicken/',
'https://www.dessertfortwo.com/instant-pot-salsa-chicken/',
'https://www.dinneratthezoo.com/spaghetti-salad/',
'https://www.eatyourselfskinny.com/sweet-potato-black-bean-quinoa-bake/',
'https://www.geniuskitchen.com/recipe/savory-mushroom-spinach-cheese-crepes-373390?soc=socialsharingpinterest',
'https://www.gimmesomeoven.com/cucumber-quinoa-salad-recipe/',
'https://www.healthymealplans.com/recipe-details/classic-one-pot-pasta',
'https://www.healthymealplans.com/recipe-details/lazy-lasagna-bake',
'https://www.healthymealplans.com/recipe-details/turkey-and-wild-rice-soup',
'https://www.lecremedelacrumb.com/easy-healthy-baked-lemon-chicken/',
'https://www.lecremedelacrumb.com/instant-pot-pot-roast-potatoes/',
'https://www.loveandoliveoil.com/2007/09/spicy-salsa-turkey-burgers.html',
'https://www.loveandoliveoil.com/2015/02/chicken-tortilla-soup.html',
'https://www.purewow.com/recipes/chicken-snap-pea-stir-fry-recipe',
'https://www.purewow.com/recipes/skillet-pasta-squash-ricotta-basil-recipe',
'https://www.recipegirl.com/quinoa-stuffed-peppers/',
'https://www.recipegirl.com/quinoa-stuffed-peppers/',
'https://www.simplyhappyfoodie.com/instant-pot-mini-sweet-potato-chili/',
'https://www.skinnytaste.com/baked-potato-soup/',
'https://www.skinnytaste.com/butternut-squash-and-spinach-lasagna/',
'https://www.skinnytaste.com/crust-less-summer-zucchini-pie/',
'https://www.skinnytaste.com/lighter-baked-macaroni-and-cheese/#recipe',
'https://www.skinnytaste.com/skinny-baked-broccoli-macaroni-and/',
'https://www.skinnytaste.com/spinach-lasagna-rolls/',
'https://www.skinnytaste.com/stuffed-pepper-soup/',
'https://www.tablefortwoblog.com/quinoa-bowls-with-roasted-vegetables-and-chicken/',
'https://www.tasteofhome.com/recipes/butternut-portobello-lasagna/',
'https://www.tasteofhome.com/recipes/linguine-with-broccoli-rabe-peppers/',
'https://www.thekitchn.com/recipe-broccoli-and-cheese-risotto-256187',
'https://www.theroastedroot.net/roasted-winter-vegetable-quinoa-salad-with-cider-vinaigrette/',
'https://www.wellplated.com/caprese-chicken-pasta/',
];
const scraped = urls
.map((url) => axiosInstance.post('recipes/test-scrape-url', { url }));
const data = await Promise.all(scraped);
console.log(data);
})();
Ah. Gotcha. Since mealie, doesn't implement any sort of que you're running up against concurrent read/write errors to the database, SQLite in particular is fairly limited for concurrency.
I'd suggest just your running the script synchronously instead of it's async implementation. Await each request, and then you should be good to go.
In this case I'm using postgres - however I wonder if I'm causing some kind of collision with the auto-increment database keys...
So, I've refactored my script to run serially instead of concurrently.
import { axiosInstance } from './api';
(async () => {
const stuff = [];
const links = [
// .. recipes
];
for (const url of links){
console.log(`Scraping: ${url}`);
let { data } = await axiosInstance.post('recipes/create-url', { url });
stuff.push(data);
}
})();
The scrape URL now works, but some aspect of the recipe/create-url
endpoint seems to be breaking...
mealie | ERROR: 07-Nov-21 23:40:02 Error parsing recipe func_call for 'totalTime'
mealie | ERROR: 07-Nov-21 23:40:02 Error parsing recipe func_call for 'prepTime'
mealie | INFO: 07-Nov-21 23:40:02 Image ['https://wholeandheavenlyoven.com/wp-content/uploads/2015/03/Springtime-Veggie-Pasta-Primavera10.jpg']
mealie | INFO: 07-Nov-21 23:40:02 Image URL: ['https://wholeandheavenlyoven.com/wp-content/uploads/2015/03/Springtime-Veggie-Pasta-Primavera10.jpg']
mealie | INFO: 07-Nov-21 23:40:03 File Name Suffix .jpg
mealie | /app/data/recipes/springtime-veggie-pasta-primavera/images/original.jpg
mealie | INFO: 07-Nov-21 23:40:03 original.jpg Minified: 115.67 kB -> 106.92 kB -> 15.77 kB
mealie | 172.21.0.1:0 - "POST /api/recipes/create-url HTTP/1.1" 201
mealie | /app/data/recipes/springtime-veggie-pasta-primavera/images/min-original.webp
mealie | ERROR: 07-Nov-21 23:40:03 Failed to extract rdfa, raises 'str' object has no attribute 'decode'
mealie | Traceback (most recent call last):
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/extruct/_extruct.py", line 108, in extract
mealie | output[syntax] = list(extract(document, base_url=base_url))
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/extruct/rdfa.py", line 154, in extract_items
mealie | jsonld_string = g.serialize(format='json-ld', auto_compact=not expanded).decode('utf-8')
mealie | AttributeError: 'str' object has no attribute 'decode'
mealie | 172.21.0.1:0 - "POST /api/recipes/create-url HTTP/1.1" 500
mealie | [2021-11-07 23:40:03 +0000] [51] [ERROR] Exception in ASGI application
mealie | Traceback (most recent call last):
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
mealie | result = await app(self.scope, self.receive, self.send)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
mealie | return await self.app(scope, receive, send)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/fastapi/applications.py", line 199, in __call__
mealie | await super().__call__(scope, receive, send)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/applications.py", line 111, in __call__
mealie | await self.middleware_stack(scope, receive, send)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 181, in __call__
mealie | raise exc from None
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 159, in __call__
mealie | await self.app(scope, receive, _send)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
mealie | raise exc from None
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
mealie | await self.app(scope, receive, sender)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/routing.py", line 566, in __call__
mealie | await route.handle(scope, receive, send)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/routing.py", line 227, in handle
mealie | await self.app(scope, receive, send)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/routing.py", line 41, in app
mealie | response = await func(request)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 201, in app
mealie | raw_response = await run_endpoint_function(
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 150, in run_endpoint_function
mealie | return await run_in_threadpool(dependant.call, **values)
mealie | File "/opt/pysetup/.venv/lib/python3.9/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
mealie | return await loop.run_in_executor(None, func, *args)
mealie | File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
mealie | result = self.fn(*self.args, **self.kwargs)
mealie | File "/app/mealie/routes/recipe/recipe_crud_routes.py", line 68, in parse_recipe_url
mealie | recipe = create_from_url(url.url)
mealie | File "/app/mealie/services/scraper/scraper.py", line 35, in create_from_url
mealie | elif og_dict := extract_open_graph_values(url):
mealie | File "/app/mealie/services/scraper/scraper.py", line 59, in extract_open_graph_values
mealie | if recipe.get("name", "") == "":
mealie | AttributeError: 'NoneType' object has no attribute 'get'
It managed about 3 recipes(including this one, curiously) before it gave up and returned 500 to my import script
Oddly, despite this - it seems to have managed to grab the ingredients and instructions well enough...
Ah, it looks like the mealie API is returning a 500 for these API calls despite the apparent success in importing them...
It looks like you're running into issues related to your specific site as such, I'm going to close this issue since your initial problem was solved.
If you want help resolving the additional errors with importing from certain sites I'd suggest
1) Verifying that the website returns valid JSON - You can check with https://demo.mealie.io/recipes/debugger?test_url= 2) If it doesn't you can check the library we use to see if they have specific support for that site. https://github.com/hhursev/recipe-scrapers 3) If it does but you're still experiencing issues, you can open another issue with the problem URL and the corresponding logs related to the URL.
First Check
What is the issue you are experiencing?
I'm migrating from recipes stored in google calendar to Mealie and I'm using the API to bulk insert recipes into the system. The recipes I'm trying to bulk insert as as follows:
I'm less familiar with python than other languages, but watching the logs seems to suggest that the app is crashing and then restarting when I try to do this. I had been trying to insert them in parallel, and when that failed, I tried inserting smaller chunks in parallel (i.e: rather than try to insert the whole 60-odd at once, try to just do 10 or 20 at once). This also seemed to crash the app.
I've provided logging output below:
Deployment
Docker (Linux)
Deployment Details
Currently running on my Debian Linux 9 laptop for dev purposes. Running via docker-compose. Compose file as follows:
Mealie Version
0.5.3