neuecc / Utf8Json

Definitely Fastest and Zero Allocation JSON Serializer for C#(NET, .NET Core, Unity, Xamarin).
MIT License
2.36k stars 267 forks source link

Use `JsonUtf8Encoding : Encoding` #17

Open neuecc opened 6 years ago

neuecc commented 6 years ago

Escaping string character is hurt of performance of JSON serialization. It is possible to reduce escape cost by creating custom UTF8 Encoding that includes JSON encoding/decoding. for invoke internal FastAllocateString, it is necessary to inherit Encoding.

public class JsonUtf8Encoding : Encoding
{
    #region decode(for reader)

    // (Encoding.GetString) -> GetCharCount -> (FastAllocateString) -> GetChars

    public override int GetCharCount(byte[] bytes, int index, int count)
    {
        // return CharCount is \" (.+) \", (.+) group unescaped.
        if (bytes[index] != '\"') throw new InvalidOperationException();

        throw new NotImplementedException();
    }

    public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
    {
        throw new NotImplementedException();
    }

    #endregion

    #region encode(for writer)

    // should use GetByteCount? too large?

    public override int GetMaxByteCount(int charCount)
    {
        return Encoding.UTF8.GetMaxByteCount(charCount) * 2; // worst case, escaped.
    }

    public override unsafe int GetBytes(string s, int charIndex, int charCount, byte[] bytes, int byteIndex)
    {
        int byteCount = bytes.Length - byteIndex;

        fixed (char* pChars = s)
        fixed (byte* pBytes = bytes)
        {
            return GetBytes(pChars + charIndex, charCount, pBytes + byteIndex, byteCount);
        }
    }

    public override unsafe int GetBytes(char* chars, int charCount, byte* bytes, int byteCount)
    {
        throw new NotImplementedException();
    }

    #endregion

    public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
    {
        throw new NotSupportedException();
    }

    public override int GetByteCount(char[] chars, int index, int count)
    {
        throw new NotSupportedException();
    }

    public override int GetMaxCharCount(int byteCount)
    {
        throw new NotSupportedException();
    }
}

Also, it is necessary to implement efficient UTF 8 encoding/decoding. I found this article. http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ If there are any other good examples, please let me know.

neuecc commented 6 years ago

@itn3000 is trying fast utf8 <-> utf16 utilities. https://github.com/itn3000/unicode-convert-utilities

@ufcpp is building custom UTF8 decoder. https://github.com/ufcpp/Utf8Utils

NStack is golang like new encoding system. https://github.com/migueldeicaza/NStack

System.Text.Utf8String is span based new primitive. https://github.com/dotnet/corefxlab/tree/master/src/System.Text.Utf8String/System/Text

Tornhoof commented 6 years ago

Regarding utf-8: http://nullprogram.com/blog/2017/10/06/ https://news.ycombinator.com/item?id=15423674 and related from https://github.com/dotnet/corefxlab/issues/1831

penguinawesome commented 4 years ago

hi @neuecc we badly need your help, do you have an idea or workaround for our issue? https://github.com/neuecc/Utf8Json/issues/224