nextcloud / cookbook

🍲 A library for all your recipes
https://apps.nextcloud.com/apps/cookbook
GNU Affero General Public License v3.0
527 stars 89 forks source link

can not add recipes from foodnetwork.com #115

Closed geeseven closed 4 years ago

geeseven commented 4 years ago

Greetings,

I am running into some issues adding recipes from foodnetwork.com like the following:

https://www.foodnetwork.com/recipes/alton-brown/shepherds-pie-recipe2-1942900

The site does offer schema.org json data.

long and ugly curl output

Is this an issue with foodnetwork.com?

mrzapp commented 4 years ago

@geeseven I just tried it with the latest code from the develop branch, so it seems we've fixed whatever was causing the issue. When all our current issues are closed, a new patch will be published, I don't think it'll be much longer now.

Closing this issue for now.

geeseven commented 4 years ago

After upgrading to 0.5.5, recipes from foodnetwork.com are still not getting added. This time with a Could not add recipe: "Could not find recipe element" error. The nextcloud.log file also got spammed with 195 entries. Here are the first and last two as examples:

{
  "reqId": "gTMyT7k2BTZS80H607uM",
  "level": 3,                     
  "time": "2019-12-06T03:27:02+00:00",
  "remoteAddr": "127.0.0.1",
  "user": "user",
  "app": "PHP",
  "method": "POST",
  "url": "/index.php/apps/cookbook/add",
  "message": "DOMDocument::loadHTML(): Tag section invalid in Entity, line: 824 at /usr/local/www/nextcloud/apps/cookbook/lib/Service/Re
cipeService.php#351",
  "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.13.2 Chrome/73.0.3683.105 Safari/53
7.36",
  "version": "17.0.1.1"
}
{
  "reqId": "gTMyT7k2BTZS80H607uM",
  "level": 3,
  "time": "2019-12-06T03:27:02+00:00",
  "remoteAddr": "127.0.0.1",
  "user": "user",
  "app": "PHP",
  "method": "POST",
  "url": "/index.php/apps/cookbook/add",
  "message": "DOMDocument::loadHTML(): Tag section invalid in Entity, line: 871 at /usr/local/www/nextcloud/apps/cookbook/lib/Service/Re
cipeService.php#351",
  "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.13.2 Chrome/73.0.3683.105 Safari/53
7.36",
  "version": "17.0.1.1"
}
...
{
  "reqId": "gTMyT7k2BTZS80H607uM",
  "level": 3,                     
  "time": "2019-12-06T03:27:02+00:00",
  "remoteAddr": "127.0.0.1",
  "user": "user",
  "app": "PHP",
  "method": "POST",
  "url": "/index.php/apps/cookbook/add",
  "message": "DOMDocument::loadHTML(): Tag use invalid in Entity, line: 4852 at /usr/local/www/nextcloud/apps/cookbook/lib/Service/Recip
eService.php#351",
  "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.13.2 Chrome/73.0.3683.105 Safari/53
7.36",
  "version": "17.0.1.1"
}
{
  "reqId": "gTMyT7k2BTZS80H607uM",
  "level": 3,
  "time": "2019-12-06T03:27:02+00:00",
  "remoteAddr": "127.0.0.1",
  "user": "user",
  "app": "PHP",
  "method": "POST",
  "url": "/index.php/apps/cookbook/add",
  "message": "DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 4903 at /usr/local/www/nextcloud/apps/cookbook
/lib/Service/RecipeService.php#351",
  "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.13.2 Chrome/73.0.3683.105 Safari/53
7.36",
  "version": "17.0.1.1"
}

I can supply the entire log entries if needed.

mrzapp commented 4 years ago

Alright, I'm seeing the issue now as well, but only for certain recipes.

Teifun2 commented 4 years ago

For further reference, could you link one example that does not work?

geeseven commented 4 years ago

@Teifun2, the link from my original post is an example. That being said, I tested like five or six random recipes from foodnetwork.com and none of them worked.

mrzapp commented 4 years ago

This is working in 0.6.0

geeseven commented 4 years ago

I am still unable to add any recipes from foodnetwork.com with 0.6.3. Still getting the same errors as before. Here are some random recipes I attempted to add:

https://www.foodnetwork.com/recipes/food-network-kitchen/summery-herbed-tuna-pasta-salad-5291038

https://www.foodnetwork.com/recipes/spinach-tortellini-soup-recipe-1958087

https://www.foodnetwork.com/recipes/food-network-kitchen/braised-beans-recipe-1973661

https://www.foodnetwork.com/recipes/ree-drummond/mexican-rice-casserole-recipe-2043294

Am I running into some sort of cache issue? 🤷‍♂️

mrzapp commented 4 years ago

@geeseven there are a lot of redirects happening with the links your provided there. Some redirect to working recipes and others don't.

These ones work for me:

This is not a recipe, but a category overview:

This one has been removed from the website:

It seems to be working as intended :man_shrugging:

geeseven commented 4 years ago

Hey @mrzapp,

Thanks for looking into this issue again. I have found that foodnetwork.com and foodnetwork.co.uk are not just different domain names pointing to the same content.

I just retested those four urls from my home ISP and from my nextcloud server and all loaded recipe pages in a browser. In Tor Browser with a German endpoint, I could recreate the behaviour you were seeing.

I am able to add the co.uk versions of the two working recipes to the cookbook app.

Do you have access to an end point that does not redirect foodnetwork.com to foodnetwork.co.uk?

mrzapp commented 4 years ago

I suppose I could VPN to a server in the USA, but it won't be soon 😅 This whole thing with foodnetwork seems a bit convoluted

geeseven commented 4 years ago

This whole thing with foodnetwork seems a bit convoluted

100% agree

The wayback machine does have some of the foodnetwork.com pages cached:

https://web.archive.org/web/20200322182313/https://www.foodnetwork.com/recipes/ree-drummond/mexican-rice-casserole-recipe-2043294

https://web.archive.org/web/20200329224335/https://www.foodnetwork.com/recipes/alton-brown/shepherds-pie-recipe2-1942900

mrzapp commented 4 years ago

Aha, I see what the issue is:

<script type="application/ld+json">[{"@context":"http://schema.org","@type":"Recipe", ... }]</script>

They've marked up the recipe in correct ld+json, but as an Array with a single recipe in it, rather than the standard Object presentation.

So we could fix this by checking for $json[0].

Reopening the issue (once again) :D

geeseven commented 4 years ago

Hey @mrzapp,

Did you have a change of heart regarding adding some custom code for foodnetwork.com? If so, that is completely understandable.

For me and I am sure plenty of other North Americans, the lack of foodnetwork.com is a deal breaker.

mrzapp commented 4 years ago

@geeseven no, I just fixed the issue I found with the site and I was able to import recipes just fine afterwards. If you're still having issues, you're welcome to reopen this ticket and add some details of the recipe that's failing.

geeseven commented 4 years ago

@mrzapp, thanks for the timely reply. Looks like I got confused when attempting to add some recipes and mixed up some sites. I can confirm foodnetwork.com does work for me. Thanks for this workaround.

uuuu1234 commented 4 years ago

Hi there,

I get the "Could not find recipe element" error message (running version 0.7.6) for example for the following recipe:

https://food52.com/recipes/9473-beet-orange-olive-and-walnut-salad

It works for some recipes from this source, but not all of them.

Thanks in advance

mrzapp commented 3 years ago

@uuuu1234 please create new issues for different sites, this is a closed issue for foodnetwork.com