thesamim / TickTickSync

GNU General Public License v3.0
159 stars 12 forks source link

Tags with non-ascii characters are split #171

Closed bluebirch closed 1 month ago

bluebirch commented 1 month ago

Describe the bug

Tags with non-ascii characters are split and the non-ascii partof the tag is considered a part of the description. In Obsidian, the task s correct, for example:

- [ ] Something to buy #inköp

But on TickTick, this becomes:

- [ ] Something to buy  öp #ink

To Reproduce Steps to reproduce the behavior:

  1. Create a tag with a non-ascii character on TickTick.
  2. Sync.
  3. Sync.

Expected behavior

Tags with non-ascii characters at both ends.

thesamim commented 1 month ago

@bluebirch : Please provide a sample set of tags with emojis. I'd never considered having emojis in tags....

bluebirch commented 1 month ago

It's not about emojis but characters in the Swedish alphabet like å, ä and ö. With non-ASCII I refer to anything beyond the first 127 characters. Old school 7-bit ASCII. 🙂

Example tags I use: #inköp, #ärenden and #Vällingby. None of them work.

thesamim commented 1 month ago

Ah! I don't supposed you know what the equivalent regex would be for those characters? If you don't, I'll do some research.

Never mind, I figured it out: (?<=\s)#[\w\d\u00c4-\u00f6\u4e00-\u9fff\u0600-\u06ff\uac00-\ud7af-_/]+ Will implement ASAP.

thesamim commented 1 month ago

@bluebirch: I think I fixed it. Could you please test with the attached?

To use this:

  1. Go to your .obsidian\plugins\tickticksync directory
  2. Back that directory up
  3. Unzip the attached into the directory.

Let me know what happens.

TickTickSync-ForTestOnly-1036.zip

Thanks.

bluebirch commented 1 month ago

Ah! ~I don't supposed you know what the equivalent regex would be for those characters? If you don't, I'll do some research.~

Never mind, I figured it out: (?<=\s)#[\w\d\u00c4-\u00f6\u4e00-\u9fff\u0600-\u06ff\uac00-\ud7af-_/]+ Will implement ASAP.

I don't know JavaScript, but I thought \w would catch any alphanumeric characters, even UTF-8. It does in Perl.

bluebirch commented 1 month ago

@bluebirch: I think I fixed it. Could you please test with the attached?

Works.

bluebirch commented 1 month ago

But tags are converted to lowercase. I'd like chacacter case to be preserved.

thesamim commented 1 month ago

But tags are converted to lowercase. I'd like chacacter case to be preserved.

This was a limitation of the API. It would not recognize mixed case tags properly. I will check if they have updated this. If they have I will adjust the functionality accordingly.

thesamim commented 1 month ago

But tags are converted to lowercase. I'd like chacacter case to be preserved.

This was a limitation of the API. It would not recognize mixed case tags properly. I will check if they have updated this. If they have I will adjust the functionality accordingly.

@bluebirch : Verified: API still does not handle mixed case tags. Will do some more testing and release latest tag changes.