xwmx / nb

CLI and local web plain text note‑taking, bookmarking, and archiving with linking, tagging, filtering, search, Git versioning & syncing, Pandoc conversion, + more, in a single portable script.
https://xwmx.github.io/nb
GNU Affero General Public License v3.0
6.72k stars 190 forks source link

Suggestion: call out the fact that adding bookmarks without pandoc adds entire source code of web pages #225

Open ejheil opened 1 year ago

ejheil commented 1 year ago

Something that surprised me a bit looking into this was when I started bookmarking pages, it became difficult to get useful results out of search because, well, huge amounts of HTML were saved with each bookmark. And all of that became searchable and would show up in search results.

Which is fine, that's intended behavior, but for somebody just looking at making some bookmarks of URLs it's a bit surprising, and it's also, honestly, not that useful, because nobody wants to read html source code, or to have that show up in search results. Modern HTML is not readable and the source of a web page includes vast amounts of cruft.

Now if you've got pandoc and readability-cli, this suddenly starts becoming very cool because you have actual readable text being saved. THAT's neat. But I had bookmarked a bunch of pages before I realized that, and I had to go back through and re-add them after installing pandoc/readability-cli to make them not tag-ridden unreadable garbage.

So if there were some kind of warning, either in the docs or in messages in the code, that when you bookmark without pandoc you're getting, well, something you probably really don't want, that would be helpful.

Another neat option would be, default to NOT saving the text of web pages if you don't have pandoc installed, only default to saving them if it is installed.

xwmx commented 1 year ago

@ejheil Thanks for the informative thoughts and insights. So far there is just a --no-download option that prevents the download altogether, and therefore doesn't extract the title or other info. I'll aim to incorporate your recommendations and improve this.