ycm-core / ycmd

A code-completion & code-comprehension server
https://ycm-core.github.io/ycmd/
GNU General Public License v3.0
1.69k stars 764 forks source link

[READY] Update unicode support to version 15.1 #1719

Closed bstaletic closed 9 months ago

bstaletic commented 10 months ago

This should bring ycmd to support unicode 15.1.

What is in this pull request

Brand new stuff

Unicode 15 changed the grapheme cluster boundary rules, by adding GB9c. That thing talks about "indic conjunct break" property, from annex 44. The data itself is DerivedCoreProperties.txt.

That has lead me to adding a new data member to our gigantic code_points object.

Bug fixes

urllib.request.urlopen returns a response whose read() (and similar) return bytes, which breaks our trusty Download( url ) helper. Yes, this was a very old bug.

Playing hide the data

Old link for the unversioned emoji-data.txt is not working, so I used the versioned one. I have just found the unversioned one: https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt


This change is Reviewable

Fixes #https://github.com/ycm-core/ycmd/issues/1718

bstaletic commented 10 months ago

@DonKult I would not mind your review as well, if you are up for it.

codecov[bot] commented 10 months ago

Codecov Report

Merging #1719 (36a9461) into master (0607eed) will increase coverage by 0.03%. The diff coverage is 98.05%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1719 +/- ## ========================================== + Coverage 95.41% 95.45% +0.03% ========================================== Files 83 83 Lines 8131 8136 +5 Branches 165 163 -2 ========================================== + Hits 7758 7766 +8 + Misses 322 320 -2 + Partials 51 50 -1 ```
DonKult commented 10 months ago

@DonKult I would not mind your review as well, if you are up for it.

Sadly, I can't provide much more than emotional support as I know next to nothing about the internals of utf-8 nor use/need/know any scripture that would use the finer details of word splitting. I just remembered seeing an issue about this elsewhere in Debian.

I did test this branch through (with the vim plugin) and it updates unicode, builds & runs (and tests) just fine in the previous as well as last force-push version, so as far that counts as review I can approve of the changes.

(I do apply additional changes through to remove the *.inc files & drop other third-party embeds/downloads in favor of system-provided things, which hide the Download issue in update_unicode.py for me for example).

bstaletic commented 10 months ago

Confirming that things are not broken on your side is good to know.

(I do apply additional changes through to remove the *.inc files & drop other third-party embeds/downloads in favor of system-provided things, which hide the Download issue in update_unicode.py for me for example).

I'm curious. May I see the patches?

DonKult commented 10 months ago

(I do apply additional changes through to remove the *.inc files & drop other third-party embeds/downloads in favor of system-provided things, which hide the Download issue in update_unicode.py for me for example).

I'm curious. May I see the patches?

Sure, not a secret, just not usually an upstream favorite topic. I haven't pushed unfuzzied/updated version(s) yet, but you can see them either listed in Debian's Patch tracker for what is currently in the archive or in the VCS of the ycmd packaging bits for possibly unreleased things and as some of them don't make much sense without the vim-plugin context: patches vcs. Some of them I might/should clean up and official propose some day…, there is just always another dumpster fire to deal with first. At least, that is what I claim as an excuse.

mergify[bot] commented 9 months ago

Thanks for sending a PR!

mergify[bot] commented 9 months ago

Thanks for sending a PR!