preservim / tagbar

Vim plugin that displays tags in a window, ordered by scope
https://preservim.github.io/tagbar
Other
6.12k stars 486 forks source link

fixed: CJK characters decoding error problem #745

Closed skywind3000 closed 3 years ago

skywind3000 commented 3 years ago

PR 733 changed popen output format from binary to text but forgot to initialize errors to ignore, which may crash when there are unicode decoding errors:

图片

and:

图片

When filenames contain strange CJK characters, python's decode may fail. A errors = "ignore" can simply fix this.

skywind3000 commented 3 years ago

@alerque , how to request a review ? Could you help to review this change ?

alerque commented 3 years ago

Thanks for the contribution. This is slightly outside my expertise in Python, but the fix looks reasonable and I'll take your word for it being the right fix. Just one possible nit pick on the coding...

Shane-XB-Qian commented 3 years ago

pls check let g:tagbar_systemenc = 'gbk' or sth, or perhaps better try to fix the why/failure of encoding vs (not to) 'ignore' ...................... i think!

skywind3000 commented 3 years ago

let g:tagbar_systemenc = 'gbk' won't help, the binary output is decoded from binary to text in python after PR733.

python's subprocess.Popen module can detect default system encoding and use gbk for decoding.

See the error message above:

gbk codec can't decode byte 0xae in position 680

Popen has already detected the system encoding as 'gbk' correctly, the issue is not caused by wrong codec name.

So, let g:tagbar_systemenc os similar things will not help.

skywind3000 commented 3 years ago

Python's gbk decoder uses very strict rules and there may be slight differece with windows gbk/gb2312 codec.

Python raise an exception when unicode decoding errors encountered. It is not an error, but a chance to allow users to decide what to do with the exception characters (simply ignore or replace it with a ? or just interrupt the process).

Most of time, set errors="ignore" is a practical way and is enough for most applications.

Shane-XB-Qian commented 3 years ago

it's not regarding to codec name, but 'why', perhaps his/its ori src code was wrong (codec) too.. so i think perhaps better to fix the why.. // and try let g:tagbar_python = 3 or let g:tagbar_python = 2

skywind3000 commented 3 years ago

No sense to argue with the codec, filename field in the ctags database is neally unused in tagbar.

So, this PR is enough for this (prevent crash). You can still create your new patch if you don't agree with this.

(Useful hint: If you wish to change this, remember to revert this change and PR 733, keep subprocess.Popen produce binary data and you need to find a new way to fix issue 629. )