inhumantsar / slurp

Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.
https://inhumantsar.github.io/slurp/
MIT License
126 stars 2 forks source link

GitHub Anchors #5

Open inhumantsar opened 3 months ago

inhumantsar commented 3 months ago

Gists (and likely GitHub Markdown previews, eg README.md) include anchors which use CSS to load an icon. These end up in Markdown like so:

## Some Heading

[](https://.../...#some-heading)

When parsed, the anchor link should be removed entirely, or be applied to the heading like so: ## [Some Heading](#some-heading)

Truncated commented 1 month ago

Here's a log from my trying to slurp this readme. I did it three times, one after the other using my clipboard, and the last time worked. shrug https://github.com/Garsondee/pf2e-roll-manager/blob/master/README.md

image

Pictures of Source

First file ![image](https://github.com/inhumantsar/slurp/assets/14208325/75e30b3f-1783-409d-bdd0-792a7d21dc57) Second file ![image](https://github.com/inhumantsar/slurp/assets/14208325/9eb6da68-5888-421f-a225-38dba67ce017) Third file - success! ![image](https://github.com/inhumantsar/slurp/assets/14208325/55611cba-df4b-4eac-a5e0-e6fc92a54edd)

Preview versions

Nope! (first two look like this) ![image](https://github.com/inhumantsar/slurp/assets/14208325/aa74cbe3-8325-4c04-8f0d-d5622f692aa7) Yep! ![image](https://github.com/inhumantsar/slurp/assets/14208325/3c7adb76-b952-496e-b7fe-039aa9d9eb90)

Log

``` ##### 1717879654582 | DEBUG | attempting to parse prop metadata - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` { "enabled": true, "custom": false, "_key": "twitter", "_idx": 7, "_format": "s|https://twitter.com/{s}", "id": "twitter", "metaFields": [ "twitter:creator", "twitter:site" ], "defaultIdx": 7, "defaultKey": "twitter", "description": "Twitter/X link for the author or site.", "defaultFormat": "s|https://twitter.com/{s}" } ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "twitter:creator" "meta[name=\"twitter:creator\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "twitter:site" "meta[name=\"twitter:site\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" { "0": {} } ``` ##### 1717879654582 | DEBUG | adding metadata - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` { "prop": { "enabled": true, "custom": false, "_key": "twitter", "_idx": 7, "_format": "s|https://twitter.com/{s}", "id": "twitter", "metaFields": [ "twitter:creator", "twitter:site" ], "defaultIdx": 7, "defaultKey": "twitter", "description": "Twitter/X link for the author or site.", "defaultFormat": "s|https://twitter.com/{s}" }, "elements": { "0": {} }, "metaFields": {}, "querySelector": "meta[name=\"twitter:site\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" } ``` ##### 1717879654582 | DEBUG | attempting to parse prop metadata - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` { "enabled": true, "custom": false, "_key": "tags", "_idx": 8, "_format": "S|{prefix}/{tag}", "id": "tags", "metaFields": [ "tags", "keywords", "article:tag", "parsely-tags", "news_keywords" ], "defaultIdx": 8, "defaultKey": "tags", "description": "Tags and keywords present in the page's metadata.", "defaultFormat": "S|{prefix}/{tag}" } ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "tags" "meta[name=\"tags\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "keywords" "meta[name=\"keywords\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "article:tag" "meta[name=\"article:tag\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "parsely-tags" "meta[name=\"parsely-tags\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "news_keywords" "meta[name=\"news_keywords\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | attempting to parse prop metadata - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` { "enabled": false, "custom": false, "_key": "onion", "_idx": 9, "id": "onion", "metaFields": [ "onion-location" ], "defaultIdx": 9, "defaultKey": "onion", "description": "Link to a mirror of the content on Tor." } ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "onion-location" "meta[name=\"onion-location\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | attempting to parse prop metadata - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` { "enabled": true, "custom": false, "_key": "slurped", "_idx": 10, "_format": "d|YYYY-MM-DDTHH:mm", "id": "slurped", "defaultIdx": 10, "defaultKey": "slurped", "description": "Date/time that the page was accessed by Slurp.", "defaultFormat": "d|YYYY-MM-DDTHH:mm" } ``` ##### 1717879654582 | DEBUG | attempting to parse prop metadata - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` { "enabled": true, "custom": false, "_key": "title", "_idx": 11, "id": "title", "metaFields": [ "og:title", "twitter:title" ], "defaultIdx": 11, "defaultKey": "title", "description": "Page title as seen in the browser, falling back to the title presented in metadata." } ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "og:title" "meta[name=\"og:title\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" {} ``` ##### 1717879654582 | DEBUG | found prop elements - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` "twitter:title" "meta[name=\"twitter:title\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" { "0": {} } ``` ##### 1717879654582 | DEBUG | adding metadata - Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)` ``` { "prop": { "enabled": true, "custom": false, "_key": "title", "_idx": 11, "id": "title", "metaFields": [ "og:title", "twitter:title" ], "defaultIdx": 11, "defaultKey": "title", "description": "Page title as seen in the browser, falling back to the title presented in metadata." }, "elements": { "0": {} }, "metaFields": {}, "querySelector": "meta[name=\"twitter:title\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]" } ``` ##### 1717879654586 | DEBUG | formatting string - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "tmpl": "https://twitter.com/{s}", "value": { "s": "@github" } } ``` ##### 1717879654586 | DEBUG | match found - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "match": "{s}", "name": "s", "value": "@github", "iHasName": true } ``` ##### 1717879654586 | DEBUG | stringifying yaml... - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "Source": "https://github.com/Garsondee/pf2e-roll-manager/blob/master/README.md", "byline": "Garsondee", "site": "GitHub", "excerpt": "A module for making group rolls easier in PF2E. Contribute to Garsondee/pf2e-roll-manager development by creating an account on GitHub.", "twitter": "https://twitter.com/@github", "slurped": "2024-06-08T20:47:34.586Z", "title": "pf2e-roll-manager/README.md at master · Garsondee/pf2e-roll-manager" } { "Source": 0, "byline": 1, "site": 2, "date": 3, "updated": 4, "type": 5, "excerpt": 6, "twitter": 7, "tags": 8, "slurped": 10, "title": 11 } ``` ##### 1717879654586 | DEBUG | yaml sort - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "ak": "byline", "aidx": 1, "bk": "Source", "bidx": 0 } ``` ##### 1717879654586 | DEBUG | yaml sort - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "ak": "site", "aidx": 2, "bk": "byline", "bidx": 1 } ``` ##### 1717879654586 | DEBUG | yaml sort - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "ak": "excerpt", "aidx": 6, "bk": "site", "bidx": 2 } ``` ##### 1717879654586 | DEBUG | yaml sort - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "ak": "twitter", "aidx": 7, "bk": "excerpt", "bidx": 6 } ``` ##### 1717879654586 | DEBUG | yaml sort - Caller: `SlurpPlugin.slurp (plugin:slurp:12514:18)` ``` { "ak": "slurped", "aidx": 10, "bk": "twitter", "bidx": 7 } ``` ```