inhumantsar / slurp

Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.
https://inhumantsar.github.io/slurp/
MIT License
124 stars 1 forks source link

Add support for `dc`, `prism`, and `citation_*` metadata #26

Open inhumantsar opened 1 month ago

inhumantsar commented 1 month ago

Larger publications use these for a variety of different metadata types. For example:

Nature

    <meta name="dc.title" content="Worldwide divergence of values"/>
...
    <meta name="dc.date" content="2024-04-09"/>
...
    <meta name="dc.description" content="Social scientists have long debated the ..."/>
...
    <meta name="prism.publicationDate" content="2024-04-09"/>
...
    <meta name="citation_author" content="Jackson, Joshua Conrad"/>
    <meta name="citation_author_institution" content="Booth School of Business, University of Chicago, Chicago, USA"/>
    <meta name="citation_author" content="Medvedev, Danila"/>
    <meta name="citation_author_institution" content="Booth School of Business, University of Chicago, Chicago, USA"/>

Our World in Data

<meta name="citation_publication_date" content="2024/02/19">
<meta name="citation_author" content="Max Roser">

See also https://github.com/inhumantsar/slurp/issues/25

inhumantsar commented 1 month ago

started adding these upstream: https://github.com/mozilla/readability/pull/871