inhumantsar / slurp

Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.
https://inhumantsar.github.io/slurp/
MIT License
127 stars 2 forks source link

Slurping triggered Templater... #24

Open Truncated opened 2 months ago

Truncated commented 2 months ago

This was fun. :)

I tried slurping this: https://forum.obsidian.md/t/create-file-after-choosing-a-folder/34311 The forums generally can't be targeted by Slurp effectively, but this one was a single post so I wanted to see if I could save the snippet.

I got a page, and immediately was prompted with the folder selection list that the templater example had. My Templater configuration does not have a configured folder overlapping the slurp directory, fwiw.

Page in slurped target folder:

TITLE: Create file after choosing a folder
---
link: https://forum.obsidian.md/t/create-file-after-choosing-a-folder/34311
site: Obsidian Forum
date: 2022-03-19T16:00
excerpt: This is a templater script for creating a file after choosing a
  folder  (make sure to put in your template folder and invoke it through open
  insert templater modal command)  slurped: 2024-05-09T16:43:53.634Z
title: Create file after choosing a folder
---

![Obsidian Forum](data:image/svg;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

Loading

Log - this test was controlled and prepped, but since the configuration doesn't have a clear log or insert break, I copied before and after and used Beyond Compare to find the new stuff.

1715273033628 | DEBUG | attempting to parse prop metadata
{
  "enabled": true,
  "custom": false,
  "_key": "link",
  "_idx": 0,
  "id": "link",
  "metaFields": [
    "url",
    "og:url",
    "parsely-link",
    "twitter:url"
  ],
  "defaultIdx": 0,
  "defaultKey": "link",
  "description": "Page URL provided or a permalink discovered in metadata."
}
1715273033628 | DEBUG | found prop elements
"url"
"meta[name=\"url\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"og:url"
"meta[name=\"og:url\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"parsely-link"
"meta[name=\"parsely-link\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"twitter:url"
"meta[name=\"twitter:url\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{
  "0": {}
}
1715273033628 | DEBUG | adding metadata
{
  "prop": {
    "enabled": true,
    "custom": false,
    "_key": "link",
    "_idx": 0,
    "id": "link",
    "metaFields": [
      "url",
      "og:url",
      "parsely-link",
      "twitter:url"
    ],
    "defaultIdx": 0,
    "defaultKey": "link",
    "description": "Page URL provided or a permalink discovered in metadata."
  },
  "elements": {
    "0": {}
  },
  "metaFields": {},
  "querySelector": "meta[name=\"twitter:url\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
}
1715273033628 | DEBUG | attempting to parse prop metadata
{
  "enabled": true,
  "custom": false,
  "_key": "byline",
  "_idx": 1,
  "id": "byline",
  "metaFields": [
    "author",
    "article:author",
    "parsely-author",
    "cXenseParse:author"
  ],
  "defaultIdx": 1,
  "defaultKey": "byline",
  "description": "Name of the primary author or the first author detected."
}
1715273033628 | DEBUG | found prop elements
"author"
"meta[name=\"author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"article:author"
"meta[name=\"article:author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"parsely-author"
"meta[name=\"parsely-author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"cXenseParse:author"
"meta[name=\"cXenseParse:author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | attempting to parse prop metadata
{
  "enabled": true,
  "custom": false,
  "_key": "site",
  "_idx": 2,
  "id": "siteName",
  "metaFields": [
    "og:site_name",
    "page.content.source",
    "application-name",
    "apple-mobile-web-app-title",
    "twitter:site"
  ],
  "defaultIdx": 2,
  "defaultKey": "site",
  "description": "Website or publication name."
}
1715273033628 | DEBUG | found prop elements
"og:site_name"
"meta[name=\"og:site_name\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"page.content.source"
"meta[name=\"page.content.source\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"application-name"
"meta[name=\"application-name\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"apple-mobile-web-app-title"
"meta[name=\"apple-mobile-web-app-title\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"twitter:site"
"meta[name=\"twitter:site\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | attempting to parse prop metadata
{
  "enabled": true,
  "custom": false,
  "_key": "date",
  "_idx": 3,
  "_format": "d|YYYY-MM-DDTHH:mm",
  "id": "publishedTime",
  "metaFields": [
    "article:published_time",
    "parsely-pub-date",
    "datePublished",
    "article.published"
  ],
  "defaultIdx": 3,
  "defaultKey": "date",
  "description": "Date/time that the page was initially published.",
  "defaultFormat": "d|YYYY-MM-DDTHH:mm"
}
1715273033628 | DEBUG | found prop elements
"article:published_time"
"meta[name=\"article:published_time\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"parsely-pub-date"
"meta[name=\"parsely-pub-date\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"datePublished"
"meta[name=\"datePublished\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | found prop elements
"article.published"
"meta[name=\"article.published\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1715273033628 | DEBUG | attempting to parse prop metadata
{
  "enabled": true,
  "custom": false,
  "_key": "updated",
  "_idx": 4,
  "_format": "d|YYYY-MM-DDTHH:mm",
  "id": "modifiedTime",
  "metaFields": [
    "article:modified_time",
    "dateModified",
    "dateLastPubbed"
  ],
  "defaultIdx": 4,
  "defaultKey": "updated",
  "description": "Date/time that the page was last modified, if available.",
  "defaultFormat": "d|YYYY-MM-DDTHH:mm"
}
inhumantsar commented 2 months ago

Pages are sanitized on their way in but it's not going to be looking for Obsidian scripts like that.

I will look into stripping those <% bits out but tbh I'm surprised that Obsidian's own sanitization doesn't handle this.

Truncated commented 2 months ago

I may be misunderstanding what you mean here, but I would greatly prefer not stripping anything out - the code was what I wanted to record, specifically. Is what you're talking about some kind of add / remove process where you sanitize the <% with some other characters and then find/replace them after they are on the page, where the end result is still the content but side-stepping the Templater eval process?

This specific parsing is from Templater, not Obsidian, so I wouldn't expect sanitization built-in.

It's probably something that would require integration effort from both slurp and Templater. I didn't see an API or integration section in their documentation, unfortunately, but I haven't checked the issues to see if this is a thing in closed or open issues.

inhumantsar commented 2 months ago

Yeah I would sanitize those characters, ideally just by prefixing them with \, before the page is written to prevent them from being evaluated.