kylechui / nvim-surround

Add/change/delete surrounding delimiter pairs with ease. Written with :heart: in Lua.
MIT License
3.09k stars 61 forks source link

Multibyte (Unicode/CJK) characters are incorrectly surrounded #131

Closed ouuan closed 2 years ago

ouuan commented 2 years ago

Checklist

To reproduce

。。。。

When cursor is at the start of line, press ys3l)

Expected behavior

(。。。)。

Actual behavior

screenshot

Only the first byte of the last character is in the surrounding, and the last character is split into two pieces of invalid Unicode codes.

ouuan commented 2 years ago

Sorry, I pressed Enter in the single-line input in the issue template. I'll edit the issue body now.

I think it would be better if some inputs can use multiline input instead.

UPD: opened #132 for this.

kylechui commented 2 years ago

Thanks for the bug report + PR! I'm going to be a bit busy today; but this fix seems relatively small (famous last words) and I'll try and get it up later today.

kylechui commented 2 years ago

I wrote a small hack for this (not published) that utilizes knowledge of how UTF-8 bits are encoded, but it seems sub-ideal for actually implementing a fix. I'll look around at how similar plugins might handle the situation, and give another update tomorrow. Thanks for your patience while I look into this

kylechui commented 2 years ago

@ouuan I've published a small patch on branch v2.0.0, but there are still a few remaining issues regarding visual block mode and how the number of bytes in a UTF-8 character doesn't necessarily reflect how much space it takes up in the buffer (e.g. your period is 3 bytes but visually takes up 2 places).

smjonas commented 2 years ago

@kylechui Just a small note: take a look at :h strdisplaywidth (never used this myself but it seems useful for the implementation)

kylechui commented 2 years ago

Thanks for the heads up; I'll see if I can utilize this. I think the reason vim-surround is able to handle it so easily is because it just deletes and pastes text, instead of using byte indices to see where to add the delimiter pair.

kylechui commented 2 years ago

@smjonas Do you know if there's any functions that return the byte index into a string if you give the char index? For example if you give 。。。 with char index 3 then it would return byte index 7 (since each of the periods is 3 bytes long). If not, I can just write a helper function to do it; was just curious since you seem to know a lot about these "esoteric" functions haha

smjonas commented 2 years ago

No haha I mostly know about API functions, this is just a specific one that I somehow remembered :smile: I found byteidx, that might be it (:Telescope helpdocs ftw :D).

kylechui commented 2 years ago

Looks like that's exactly it, thanks for the help (again)!

Edit: Also found a vim.str_byteindex, which also looks promising :eyes:

kylechui commented 2 years ago

@ouuan Just pushed a most recent commit that should handle everything "as intended". Thanks @smjonas again for showing me all those functions. I also added a few more test cases that seem to capture all edge cases that I could think of, although I'm sure some bilingual users will probably find a few more exceptions. I'm going to try and clean up the code now

kylechui commented 2 years ago

From what I can tell this seems to be done; if there are any issues then please re-open or create a new issue detailing the exact issues that you're having. Thanks!