nitlang / nit

Nit language
http://nitlanguage.org
Apache License 2.0
239 stars 65 forks source link

Wrapper of ICU's UTF-16 encoded strings and conversion #2773

Closed kugelbltz closed 5 years ago

kugelbltz commented 5 years ago

u16_string module

This module is meant to ease the use of complex string operations provided by the ICU library. The module provides a wrapper for ICU's string structure : UChar * as well as conversion fucntions to/from String and CString

Remarks

lbajolet commented 5 years ago

Note: I have one question, do the UChar* from ICU require 0-terminated strings? If it is not the case, we could get away with strings that have a null byte in their contents (this is legal in Nit) when we start from any Text subclass

kugelbltz commented 5 years ago

@R4PaSs No, they do not have to be 0-terminated in ICU. But since I am working with the Nit FFI, I still have to somewhat convert strings into char * to use the library.

kugelbltz commented 5 years ago

First, thank you @R4PaSs @Morriar for your feedback. I have modified the module by taking into account your suggestions. The U16String class is now a subclass of Text so I redefined the chars function which uses the char_at_offset function of UCharString. The latter function returns a UTF-32 (UChar32 in ICU) character which rendered the need for a U16Char class useless. I also decided to scratch the []= function as it was unnecessary for modules to come.

kugelbltz commented 5 years ago

I have figured out how to deal with embeded \0 characters in strings and tried to clear some confusion with the capacity and code_units attributes. There are 3 new private classes : U16StringCharView, U16StringCharReverseIterator and U16StringCharIterator which are meant to be used for the U16String.chars function. They are basically a copy of the same classes in the flat module as I thought it was the right way to do it.

kugelbltz commented 5 years ago

@Morriar @R4PaSs Do you think that the last version is okay ?