Open advance512 opened 4 years ago
Having the same problem with this url: https://www.eatwell101.com/shrimp-and-broccoli-foil-packs-recipe
Which has this as the value for script
after running HTML_OR_JS_COMMENTLINE
'\n{
"@context":"https:\\/\\/schema.org\\/",
"@type":"Recipe",
"mainEntityOfPage":{
"@type":"WebPage","
@id":"https:\\/\\/www.eatwell101.com\\/shrimp-and-broccoli-foil-packs-recipe"},
"name":"Baked Shrimp and Broccoli Foil Packs with Garlic Lemon Butter Sauce",
"url":"https:\\/\\/www.eatwell101.com\\/shrimp-and-broccoli-foil-packs-recipe",
"headline":"Baked Shrimp and Broccoli Foil Packs with Garlic Lemon Butter Sauce",
"Description":"This baked shrimp foil pack meal is ready in under 30 minutes - The easiest way to cook shrimp in your oven!",
"author":{
"@type":"Person",
"name":"Christina Cherrier"},
"image":"https:\\/\\/www.eatwell101.com\\/wp-content\\/uploads\\/2019\\/04\\/shrimp-and-broccoli-recipe-2.jpg",
"datePublished":"2020-01-10 07:47:21",
"dateModified":"2020-06-20 17:47:39",
"Publisher":"Eatwell101",
"ingredients":"",
"prepTime":"PT10M",
"cookTime":"PT15M",
"recipeYield":"2 servings"}
// ]]>\n'
so same problem where // ]]>\n'
was not replaced correctly
Just opened a PR with a fix here: https://github.com/scrapinghub/extruct/pull/144
Seems that the issue is that the JSON-LD document is:
and after the replacing in
jsonLd._extractItems()
:it becomes:
and naturally this part which was not replaced:
causes the error.