Closed ichitaka closed 4 years ago
@ichitaka could you please check the contents of response.text
? I tried the URL you posted and it returns lots of semantic markup for me, including recipe info:
>>> import extruct
>>> import requests
>>> response = requests.get('https://elavegan.com/de/nudeln-mit-knoblauchsosse/')
>>> extruct.extract(response.text, uniform=True)
{'microdata': [],
'json-ld': [{'@context': 'https://schema.org',
'@graph': [{'@type': 'Organization',
'@id': 'https://elavegan.com/de/#organization',
'name': 'ElaVegan',
'url': 'https://elavegan.com/de/',
'sameAs': [],
'logo': {'@type': 'ImageObject',
'@id': 'https://elavegan.com/de/#logo',
'inLanguage': 'de-DE',
'url': 'https://elavegan.com/de/wp-content/uploads/sites/5/2019/09/new-logo-elavegan.png',
'width': 550,
'height': 236,
'caption': 'ElaVegan'},
'image': {'@id': 'https://elavegan.com/de/#logo'}},
....
Yes the response.text does contain the expected information. Right now I pinned down the issue. I don't have this with a fresh environment. This would mean, that there is some kind of dependency issue.
@ichitaka I see, thanks for double-checking. Please reply if you find a way to reproduce it, I'll close the issue for now.
So I'm currently investigating the issue, that for a lot of websites this tool is sadly not working. To provide an example, please consider this url:
url = "https://elavegan.com/de/nudeln-mit-knoblauchsosse/"
If i run the request and extract the structured data with the following commandI receive this empty response
Structured data is available through the google test tool and the response.text is not empty and I can find the fields in it (let's say 'recipeYield'). I have a bunch of URLs that are behaving this way and I could not figure out why this is. No robots.txt is blocking me either.