meebey / leveldb-sharp

C# LevelDB binding
https://www.meebey.net/projects/leveldb-sharp/
BSD 3-Clause "New" or "Revised" License
124 stars 38 forks source link

Character conversion issues #11

Open jeff-longino opened 10 years ago

jeff-longino commented 10 years ago

I have found a problem when attempting to store data which contains non-ascii characters. "Décor" for example with the accented é.

The issue seems to be rooted in a mis-step with calculating the string length for passing from the managed c# to the un-managed leveldb.

The call to get the expected length uses the UTF8 encoding but then the actual marshaled call passes as the current ANSI code page. In the ANSI page (for a US system) the special é takes only a single byte but in UTF8 is takes 2 bytes.

The end result is that the record is saved with an extra character of garbage, specifically one extra garbage character for each special character in the input.