Code to Reproduce
Write a small snippet to isolate your bug and could be possible to our team test. (REQUIRED)
using (var db = new LiteDatabase("test.litedb"))
{
var collection = db.GetCollection("test");
var document = new BsonDocument()
{
{ "UnpairedSurrogate", "\uD800" },
};
Console.WriteLine("Before insert: " + (document["UnpairedSurrogate"].AsString == "\uD800")); // Prints True: OK
var inserted = collection.FindById(collection.Insert(document));
Console.WriteLine("After insert: " + (inserted["UnpairedSurrogate"].AsString == "\uD800")); // Prints False: bad. should be True
}
Expected behavior
A clear and concise description of what you expected to happen.
One of the following is expected
Preserve unpaired surrogate pair after serialization
Throw exception on implicit operator BsonValue(string value) since not supported by Bson
Throw exception on ILiteCollection.Insert since not supported by LiteDB
Screenshots/Stacktrace
If applicable, add screenshots/stacktrace
Additional context
Add any other context about the problem here.
Unpaired surrogate pair is valid for windows path name so I think preserving is the best.
However, unpaired surrogate pair is not valid character in UTF-8 so I think it's reasonable to not support unpaired surrogate.
For preserving unpaired surrogate pair with backwards compatible (and forward compatible if string is valid UTF8) way, you may use WTF-8, an extension of UTF-8 used by rust and go to preserve unpaired surrogate pair in windows paths.
The reason why LiteDB replaces unpaired surrogate pair to U+FFFD silently is LiteDB uses Encoding.UTF8 which is new UTF8Encoding(true, false). I think LiteDB should use new UTF8Encoding(false, true).
Version Which LiteDB version/OS/.NET framework version are you using. (REQUIRED)
Describe the bug A clear and concise description of what the bug is.
LiteDB replaces unpaired surrogate pair tp U+FFFD silently.
Code to Reproduce Write a small snippet to isolate your bug and could be possible to our team test. (REQUIRED)
Expected behavior A clear and concise description of what you expected to happen.
One of the following is expected
implicit operator BsonValue(string value)
since not supported by BsonILiteCollection.Insert
since not supported by LiteDBScreenshots/Stacktrace If applicable, add screenshots/stacktrace
Additional context Add any other context about the problem here.
Unpaired surrogate pair is valid for windows path name so I think preserving is the best. However, unpaired surrogate pair is not valid character in UTF-8 so I think it's reasonable to not support unpaired surrogate.
For preserving unpaired surrogate pair with backwards compatible (and forward compatible if string is valid UTF8) way, you may use WTF-8, an extension of UTF-8 used by rust and go to preserve unpaired surrogate pair in windows paths.
The reason why LiteDB replaces unpaired surrogate pair to U+FFFD silently is LiteDB uses
Encoding.UTF8
which isnew UTF8Encoding(true, false)
. I think LiteDB should usenew UTF8Encoding(false, true)
.