scrapinghub / extruct

Extract embedded metadata from HTML markup
BSD 3-Clause "New" or "Revised" License
846 stars 113 forks source link

Added twitter card functionality #196

Open blackhat-7 opened 2 years ago

blackhat-7 commented 2 years ago

I have added the twitter card functionality. So now it extracts namespaces and properties of the twitter cards. I have also added 3 test cases

This was a needed feature in issue #179

For example the following now works:

>>> extruct.extract('<!doctype html><html><head><meta name="twitter:card" content="summary">')
{'microdata': [], 
'json-ld': [], 
'opengraph': [], 
'microformat': [], 
'rdfa': [],
 'dublincore': [{'namespaces': {}, 'elements': [], 'terms': []}], 
'twittercard': [{'namespace': {'twitter': 'https://dev.twitter.com/cards#'}, 'properties': [('twitter:card', 'summary')]}]}
stephentgrammer commented 2 years ago

Would love to see this get in the next version tag! @blackhat-7, you might want to update the REAME to mention twitter card usage.