hoarder-app / hoarder

A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search
https://hoarder.app
GNU Affero General Public License v3.0
6.48k stars 235 forks source link

Is there a way to extract the code snippets as well? #610

Open ulises-castro opened 3 weeks ago

ulises-castro commented 3 weeks ago

Describe the feature you'd like

It would be great if you can save articles along with code snippets, because it seems like it does not include them.

Maybe if we have "a flag" to include code snippets or not would be nice.

image

Describe the benefits this would bring to existing Hoarder users

You can take notes and review code implementation later e.g, when you review a API and want it to back to the last position you have been.

Can the goal of this request already be achieved via other means?

I'm not sure yet about this.

Have you searched for an existing open/closed issue?

Additional context

No response

kamtschatka commented 3 weeks ago

please provide a sample where you got this

ulises-castro commented 3 weeks ago

please provide a sample where you got this

What do you mean?

I took that sh from the original article, I think we can use some code highlighter to show the code and extract with the crawler

kamtschatka commented 3 weeks ago

A url

ulises-castro commented 3 weeks ago

A url

https://realpython.com/python-microservices-grpc/#asyncio-and-grpc

kamtschatka commented 2 weeks ago

I had a look at this, we are using DOMPurify, which already strips those code blocks. It is possible to change the code like this:

  const purifiedHTML = purify.sanitize(htmlContent, {ADD_TAGS: ["pre", "code", "span"]});

Then the code block is actually retained, but then we are using mozilla/readability and that ignores the code block then. I don't see any way to configure this, so that would definitely be a bigger rework.