laurent22 / joplin

Joplin - the privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android and iOS.
https://joplinapp.org
Other
45.14k stars 4.91k forks source link

Text inside <pre> tags is processed for markdown special characters #5272

Open phirestalker opened 3 years ago

phirestalker commented 3 years ago

I was going through the arduous process of manually transferring my notes from Onenote. Some of them I clipped using their poor excuse for a web clipper, so I was just going to the link and using the Joplin clipper instead of copy-paste. Everything was going wonderfully until I tried to import a website (below) that had multiple code snippets. The code snippets were not detected as code snippets, and as a result, the processor escaped all of the underline characters in the function names as well as other characters.

I inspected the code for that area of the page and found that they put a span tag inside of the pre tag. I am not sure if this is what caused the problem or not.

Oh, and it missed some bolded items as well. I'm thinking it is definitely the site being poorly coded.

https://medium.com/deepquestai/train-object-detection-ai-with-6-lines-of-code-6d087063f6ff

Environment

Joplin 2.1.9 (prod, darwin)

Sync Version: 2 Profile Version: 39 Keychain Supported: Yes

Revision: 882d663

Platform: MacOS OS specifics: Big Sur 11.5.1

Steps to reproduce

  1. Use the web clipper to clip simplified page of given URL
  2. See mangled code blocks in preview pane

Describe what you expected to happen

I was hoping it would preserve the code blocks as preformatted and not require any manual editing.

laurent22 commented 3 years ago

The code snippets in this page can't really be detected as such. How are they currently imported? I guess the block should at least be indented?

phirestalker commented 3 years ago

When I switch to the markdown pane and manually mark the >> area as code, the tab formatting springs right back.

Screen Shot 2021-08-06 at 4 18 53 AM Screen Shot 2021-08-06 at 4 21 35 AM Screen Shot 2021-08-06 at 4 22 16 AM

They are all in pre tags. Is it because of them also being inside span tags? Is there any way to ignore all other tags inside of pre tags? Is that not desirable for some reason?