laurent22 / joplin

Joplin - the privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android and iOS.
https://joplinapp.org
Other
45.46k stars 4.94k forks source link

Desktop: when non latin languages applied, command palette displays incorrect names #9870

Closed graphit0 closed 7 months ago

graphit0 commented 8 months ago

Operating system

Linux

Joplin version

2.14.12

Desktop version info

Joplin 2.14.12 (prod, linux)

Client ID: 358e8a1fa2c24626bcba79c5ac4955e3 Sync Version: 3 Profile Version: 45 Keychain Supported: No

Revision: 2881993

Bidirectional Links: 0.1.2 Calendar: 1.0.0-rc2 CodeMirror 6 snippets: 0.0.4 Combine notes: 1.2.2 Conflict Resolution: 1.2.3 Convert Text To New Note: 1.5.1 Enhancement: 1.2.1 Kanban: 1.0.7 Note list (Preview): 0.4.0 Outline: 1.5.13 Quick HTML tags: 0.2.0 Quick Links: 1.3.0 Send snippet to different note: 1.0.0 Simple Backup: 1.3.6 Templates: 2.4.0

Current behaviour

Description

when Joplin is set to non latin languages, command palette displays broken names for commands

screenshot

image

steps

  1. Change language to non-latin layout for example Bulgarian
  2. Navigate to command palette via Ctrl+shift+:
  3. Scroll to the bottom
  4. Observe broken names of commands

cast

https://github.com/laurent22/joplin/assets/44114323/2fbacc70-055d-43e4-b4fe-7e57489aa664

Notes

Expected behaviour

All commands to be displayed in user set language

pedr commented 8 months ago

I was looking at the problem and it seems to happen because packages/app-desktop/plugins/GotoAnything.tsx component calls the surroundKeywords with { escapeHtml: true } which is transforming in entities any character that is possible to transform.

https://github.com/laurent22/joplin/blob/627b8307399efe03f6b0dfe1a30e6544fe59494a/packages/lib/string-utils.ts#L235-L256

I don't understand the reason why is implemented like that, I would think that we would only want to escape reserved characters: https://developer.mozilla.org/en-US/docs/Glossary/Entity#reserved_characters

@laurent22 @personalizedrefrigerator do you think there is a reason why this is happening? Do you think it would be ok to make the change to use the escapeHtml function?

laurent22 commented 8 months ago

Isn't it because we display it in HTML so it needs to be escaped?

pedr commented 8 months ago

Isn't it because we display it in HTML so it needs to be escaped?

I understand that chars like < > need to be escaped for security reasons, but characters like é ó ã ç can be displayed as they are, as far as I know. I'm going to research a little bit about this.

Changing to escapeHtml would fix the reported issue but the problem could still happen if any of the escape characters appeared in the search result (unlikely, but something to keep in might).

laurent22 commented 8 months ago

but characters like é ó ã ç can be displayed as they are, as far as I know.

Yes but even then it doesn't really matter. &agrave; in HTML content would be displayed as à. The fact that we see html entities in there means something is escaped twice or some other problem.

pedr commented 8 months ago

but characters like é ó ã ç can be displayed as they are, as far as I know.

Yes but even then it doesn't really matter. &agrave; in HTML content would be displayed as à. The fact that we see html entities in there means something is escaped twice or some other problem.

Sorry, maybe I should have explained the issue better.

The reason why the user see the characters like this is because how the surroundKeywords works.

We are escaping the HTML string (each command phrase that is displayed in the command pallet) before executing the regex to include the prefix and suffix. The problem is that when we escape a character like é to &agrave; it breaks the function, see images below.

Example The next image is "fixed" because of another issue that I fixed in the code, it works for content that doesn't have any keyword hightlights. ![Screenshot from 2024-02-09 15-27-24](https://github.com/laurent22/joplin/assets/5051088/d12be3fd-1a39-41d8-bec6-b4937d3dd621) After search ![image](https://github.com/laurent22/joplin/assets/5051088/25c69d18-10d4-42c2-9786-428308ba2048) The HTML that renders the first item ![image](https://github.com/laurent22/joplin/assets/5051088/551f37e5-7e0d-411b-a80d-d7d35e177ba1)

The first thought would be to escape the character after the regex, but then we would escape also the prefix and suffix values that should not be escaped.

I think the best solution would be to escapeHTML when doing the regex reclamanet and around the prefix/suffix values, but I'm not sure if there is a straight way of doing this. I'm looking into it to see if I found a solution with this approach.

Writing it down again I realized that just using the escapeHtml function would also break the tags search:

Image ![image](https://github.com/laurent22/joplin/assets/5051088/7bcd949a-e909-46f4-8b93-8942c6561af1)
pedr commented 8 months ago

Ok, I managed to fix this problem for tags and commands, but I found out that note titles are stripped of special characters in the command palette.

image

I think this is an artifact from the SearchEngine code, should I look into this or create another issue?

pedr commented 8 months ago

I created another issue for this https://github.com/laurent22/joplin/issues/9919