Open teambob opened 9 years ago
Commented by andrewpunch on 2004-06-10 10:50 UTC Logged In: YES user_id=928005
Another quick possibility is to use UTF-32 (little endian) encoding. This allows access to all the characters in a document without loss of information.
The technical specification is here: http://www.unicode.org/faq/specifications-jda.html
Commented by andrewpunch on 2004-10-11 06:43 UTC Logged In: YES user_id=928005
Detirmination: Text writer will output UTF8 by default in next version. This will be compatible with ASCII for english characters, but will keep other characters.
ASCII, UTF16/32 and other mappings will be available as options in later versions.
Updated by andrewpunch on 2004-10-11 06:43 UTC
Updated by andrewpunch on 2004-10-11 06:45 UTC
Commented by andrewpunch on 2005-04-11 12:41 UTC Logged In: YES user_id=928005
This is scheduled for inclusion in 3.2.0 as UTF8 output for "text" files.
Reported by andrewpunch on 2004-06-09 03:25 UTC Characters which are non-ASCII are thrown away when writing to a text file.
There is no way around this while we write to an ASCII file.
There are some other options for file formats:
From a design perspective this could be achieved by creating maps from a single unicode character to one or more bytes.
There could be a map for:
The map need not be static. It may be dynamic. For example the ASCII map may allow through all character codes with a unicode value less than 0x0080.
There must be a process for when a unicode character is not mappable using the current map.
Created on behalf of David at Nutmeg.