Open paul-rchds opened 2 years ago
hi @paul-rchds yes, that would be great - I noticed the same issue myself but didn't get to implement everything required, here is a link to a PR https://github.com/scrapinghub/extruct/pull/129/ - feel free to start a new one.
I have changed the functionality of the extract_item function in OpengraphExtractor class, to incorporate the meta tags outside of the head. Have tested it on the link shared by @paul-rchds . Please review my PR for its workability. Thanks
On some pages meta tags are included outside of the head tag. For example on the YouTube channel page: https://www.youtube.com/c/Freecodecamp
As the opengraph extractor only looks in the head tag, all the og:* meta properties are missed. In my fork, I changed the extractor to look in the body rather.
If I get permission, I can do a PR?
Here is a link to where I made the change: https://github.com/scrapinghub/extruct/blob/c2cffbed26ae4ab8dd35d1860bfda00c3bac5783/extruct/opengraph.py#L28