inhumantsar / slurp

Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.
https://inhumantsar.github.io/slurp/
MIT License
127 stars 2 forks source link

Unsanitized filename #20

Closed chrisgrieser closed 1 month ago

chrisgrieser commented 2 months ago

It appears the file name is not properly sanitized when slurping. While stuff like / does not seem to appear for me, I noticed that # and various special characters like the middle dot are kept in the file name when for example slurping a github issue.

A # in a filename breaks internal links in Obsidian, because the # is interpreted as an attempt to link a heading, and thus not considered part of the filename.

inhumantsar commented 2 months ago

the cleanTitle function focuses on characters that are strictly disallowed by Obsidian and the filesystem.

I'm hesitant to add # to that since it's not an overly common character to find in a title, when it is present it seems likely to be fairly critical to the title's meaning (unlike " and : which are easy to replace without altering their meaning), and # is already overloaded in Obsidian between tags and headings.

that said, i'm planning to add some slurp-time options as the next big feature, eg: add/remove tags and override the save location. could add title there as well and have it display a warning if # is in the title. something like "Looks like this title has a hash mark in it, if you don't change that then you won't be able to create internal links to this page."

seems like that might be a good middle ground vs deciding for people what it should be replaced with.

chrisgrieser commented 2 months ago

I'm hesitant to add # to that since it's not an overly common character to find in a title, when it is present it seems likely to be fairly critical to the title's meaning (unlike " and : which are easy to replace without altering their meaning)

Well, that's exactly what the title property I previously suggested is for – to preserve title information not suited for filenames.

Strictly speaking, the slash / usually is even more meaningful than the # when it comes to a title, but obviously, it's removed from a filename as well.

But yeah, simply a setting for the user to decide for themselves would be fine as well, though I think stuff like # should be removed by default, since it can create confusion for users why suddenly their links are broken (it took me a bit to figure this out as well, for example.)

inhumantsar commented 2 months ago

Well, that's exactly what the title property I previously suggested is for

that's a fair point. either way, i'll look into this soonish

inhumantsar commented 1 month ago

should be fixed in 0.1.12!