Matching Order in LxmlMicrodataExtractor._extract_property_value

scrapinghub / extruct

Extract embedded metadata from HTML markup

BSD 3-Clause "New" or "Revised" License

847 stars 113 forks source link

I noticed that the matching order of _extract_property_value seems to be inconsistent with https://www.w3.org/TR/microdata/#values. In this doc, it mentions that the 2nd matching case is "If the element has a content attribute". However, in LxmlMicrodataExtractor._extract_property_value, it is 2nd to the last in the matching order.

Should this case

 elif node.get("content"):
            return node.get("content")

in w3cmicrodata.py be moved before resolving for meta tag at line 186?

Thanks a lot! Kelvin

scrapinghub / extruct

Matching Order in LxmlMicrodataExtractor._extract_property_value #160