grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python
Apache License 2.0
3.98k stars 787 forks source link

og:image is not parsed correct if e.g. og:image:width exists on page #250

Open vonholst opened 8 years ago

vonholst commented 8 years ago

og:image is parsed correctly at first if there are more og:image attributes, e.g. og:image:width it replaces the image attribute.

vonholst commented 8 years ago

I would suggest the following modification: opengraphdict.update({u"".join(attr.split(":")[1:]): value})

This will yield image, image_width, image_height, ...