neuecc / Utf8Json

Definitely Fastest and Zero Allocation JSON Serializer for C#(NET, .NET Core, Unity, Xamarin).
MIT License
2.36k stars 266 forks source link

Question: SnakeCase, Dictionary Keys, Custom Resolver. #38

Open niemyjski opened 6 years ago

niemyjski commented 6 years ago

We are looking into replacing the default serializer of Foundatio (https://github.com/FoundatioFx/Foundatio) as well as Exceptionless (https://github.com/exceptionless/Exceptionless). For Foundatio it's pretty easy to do, but for Exceptionless there are a few questions that need to be resolved and one of them is snake casing. I see that there is the ability to do snake casing and creating your own snake case resolver. However, here are the questions that I had:

SnakeCase

  1. We have a client that sends us metric ... ton of events and it's currently using a custom version of snake case (created before json.net had it). We can't change this out and we have to support it. a. How much overhead do you typically see when using a naming convention vs the standard naming? I didn't really see any benchmarks. b. How close is the snake case implementation to the json.net snake case strategy? I could see where you might have different systems and you want them to work together. c. I didn't see anything in the readme or the code that I looked, but is it possible to exclude naming conventions on dictionary keys? JSON.NET has this built in so you don't morph dictionary keys.

Custom Converters.

  1. We have a simple event model and we through all our known models into a dictionary with a known key (e.g., @error, @request). Currently in JSON.NET we can specify the type to deserialize a model to based on the key with a custom converter. Is anything like this possible? I didn't see anything like this when I was looking around.

Unknown Data Types

  1. What happens when an unknown data type can't be converted? Does it stay in some JSON type model like a JToken?

Dynamic Contract Resolver

  1. We have a dynamic contract resolver (https://github.com/exceptionless/Exceptionless/blob/master/src/Exceptionless.Core/Serialization/ElasticDynamicTypeContractResolver.cs) where we basically inspect the type being deserialized and if it's not from our library we use the default serializer behavior instead of our naming conventions etc. I need to see if this is possible (at least short term until we get on NEST 6.0 (Elasticsearch) which will be using your library as well ;)).
neuecc commented 6 years ago

Thank you, I'm happy for you are interested.

1.a.

The performance is same. Dynamic code generator generates cached string at first.

// Standard
stringByteKeys = new byte[][]
{
    JsonWriter.GetEncodedPropertyNameWithBeginObject("Age"), // {"Age":
    JsonWriter.GetEncodedPropertyNameWithPrefixValueSeparator("FirstName"), // ,"FirstName":
    JsonWriter.GetEncodedPropertyNameWithPrefixValueSeparator("LastName") // ,"LasttName":
};

// Snakecase
stringByteKeys = new byte[][]
{
    JsonWriter.GetEncodedPropertyNameWithBeginObject("age"), // {"age":
    JsonWriter.GetEncodedPropertyNameWithPrefixValueSeparator("first_name"), // ,"first_name":
    JsonWriter.GetEncodedPropertyNameWithPrefixValueSeparator("last_name") // ,"last_name":
};

1.b. I can't gurantees all cases are fine.

Utf8Json logic https://github.com/neuecc/Utf8Json/blob/8bf9a57809b0dcdfda0b75767ec96d8e5300f38b/src/Utf8Json/Internal/StringMutator.cs#L34

JSON.NET logic https://github.com/JamesNK/Newtonsoft.Json/blob/122afba9908832bd5ac207164ee6c303bfd65cf1/Src/Newtonsoft.Json/Utilities/StringUtils.cs#L208

1.c.

Yes, same as JSON.NET. Utf8Json does not mutate dictionary key naming.

2.

Please show that source codes and JSON.NET function. I may be able to post an alternative if I look at the details.

3.

JSON.NET deserialize to JToken when type is object. Utf8Json deserialize to bool, double, string, IDictionary<string, object>, List<object>. https://github.com/neuecc/Utf8Json#dynamic-deserialization

4.

for example, like this?

public class ElasticDynamicTypeJsonFormatterResolver : IJsonFormatterResolver
{
    public static readonly ElasticDynamicTypeJsonFormatterResolver Instance = new ElasticDynamicTypeJsonFormatterResolver();

    readonly HashSet<Assembly> assemblies = new HashSet<Assembly>();

    ElasticDynamicTypeJsonFormatterResolver()
    {
        assemblies.Add(typeof(ElasticsearchDefaultSerializer).Assembly);
        assemblies.Add(typeof(ElasticContractResolver).Assembly);
    }

    public IJsonFormatter<T> GetFormatter<T>()
    {
        return Cache<T>.formatter;
    }

    class Cache<T>
    {
        public static readonly IJsonFormatter<T> formatter;

        static Cache()
        {
            var type = typeof(T);

            if (ElasticDynamicTypeJsonFormatterResolver.Instance.assemblies.Contains(type.Assembly))
            {
                return Utf8Json.Resolvers.StandardResolver.SnakeCase.GetFormatter<T>();
            }
            else
            {
                // return custom resolver?
                // return 
            }
        }
    }
}
niemyjski commented 6 years ago

1.b I don't have a test to ensure I'm correct but I think they are checking that there are no spaces as well as there are not two _ in a row. Might be good to check for this. Also I wonder if we could be even faster with a fast lookup table (I'm not sure if they have done optimizations to lower invariant... (https://www.dotnetperls.com/char-lowercase-optimization)? I wonder if it would be useful to have a strategy that just took a func and you could define your own strategy real easily.

  1. This is kind of nasty (https://github.com/exceptionless/Exceptionless/blob/master/src/Exceptionless.Core/Serialization/DataObjectConverter.cs#L84) and I wish there was a better way (https://github.com/exceptionless/Exceptionless/blob/e2944f92f10474247611323aacbfac59eda74b11/src/Exceptionless.Core/Extensions/JsonExtensions.cs#L224)....

  2. Yes, Thank you very much :)

Thanks for your responses, they are greatly appreciated.

neuecc commented 6 years ago

for 1, thank you, I should check two _. We do not have to worry about performance, always cache in the first time.

Ok, 2 maybe can make custom formatter and resolver.

public class DataObjectFormatter<T> : IJsonFormatter<T>
    where T : IData, new()
{
    public DataObjectFormatter(IEnumerable<KeyValuePair<string, Type>> knownDataTypes = null)
    {
        // build knownDataTypes...
    }

    public void Serialize(ref JsonWriter writer, T value, IJsonFormatterResolver formatterResolver)
    {
        // write serialize logic...
    }

    public T Deserialize(ref JsonReader reader, IJsonFormatterResolver formatterResolver)
    {
        // read deserialize logic...
    }
}

public class DataObjectResolver : IJsonFormatterResolver
{
    public static readonly IJsonFormatterResolver Instance = new DataObjectResolver();

    DataObjectResolver()
    {

    }

    public IJsonFormatter<T> GetFormatter<T>()
    {
        return Cache<T>.formatter;
    }

    static class Cache<T>
    {
        public static readonly IJsonFormatter<T> formatter;

        static Cache()
        {
            var t = typeof(T);
            if (t == typeof(Organization))
            {
                formatter = (IJsonFormatter<T>)(object)new DataObjectFormatter<Organization>();
            }
            else if (t == typeof(Project))
            {
                formatter = (IJsonFormatter<T>)(object)new DataObjectFormatter<Project>();
            }
            // else if.....
        }
    }
}
niemyjski commented 6 years ago
  1. How big is the cache? We have dynamic user submitted values and I'm worried that could cause a lot of memory consumption?
  2. Thanks for that, I'm assuming there are helpers to serialize and deserialize? Is there a good base class I can derive from and just say serialize this type deserialize (I really only care how I deserialize a key in the dictionary to a known type)
neuecc commented 6 years ago

1. cache is only for JSON property name(not includes dictionary key). Property name should be fixed, not many.

2. Yesterday, I've built api client that includes some custom formatter/resolver.

// return response A

{
  "teams": [
    {
      "name": "docs",
      "privacy": "open",
      "description": "esa.io official documents",
      "icon": "https://img.esa.io/uploads/production/teams/105/icon/thumb_m_0537ab827c4b0c18b60af6cdd94f239c.png",
      "url": "https://docs.esa.io/"
    }
  ],
  "prev_page": null,
  "next_page": null,
  "total_count": 1,
  "page": 1,
  "per_page": 20,
  "max_per_page": 100
}

return response B

{
  "posts": [
    {
      "number": 1,
      "name": "hi!",
      "full_name": "日報/2015/05/09/hi! #api #dev",
      "wip": true,
      "body_md": "# Getting Started",
      "body_html": "<h1 id=\"1-0-0\" name=\"1-0-0\">\n<a class=\"anchor\" href=\"#1-0-0\"><i class=\"fa fa-link\"></i><span class=\"hidden\" data-text=\"Getting Started\"> &gt; Getting Started</span></a>Getting Started</h1>\n",
      "created_at": "2015-05-09T11:54:50+09:00",
      "message": "Add Getting Started section",
      "url": "https://docs.esa.io/posts/1",
      "updated_at": "2015-05-09T11:54:51+09:00",
      "tags": [
        "api",
        "dev"
      ],
      "category": "日報/2015/05/09",
      "revision_number": 1,
      "created_by": {
        "name": "Atsuo Fukaya",
        "screen_name": "fukayatsu",
        "icon": "http://img.esa.io/uploads/production/users/1/icon/thumb_m_402685a258cf2a33c1d6c13a89adec92.png"
      },
      "updated_by": {
        "name": "Atsuo Fukaya",
        "screen_name": "fukayatsu",
        "icon": "http://img.esa.io/uploads/production/users/1/icon/thumb_m_402685a258cf2a33c1d6c13a89adec92.png"
      }
    }
  ],
  "prev_page": null,
  "next_page": null,
  "total_count": 1,
  "page": 1,
  "per_page": 20,
  "max_per_page": 100
}

There are wrapped paging response, difference is "teams": [] and "posts": [].

so I've built pagination generic abstraction. https://github.com/neuecc/EsaClient/blob/master/EsaClient/Responses/Pagination.cs

public class Pagination<T>
    {
       // items does not correspond to the property name of JSON
        public T[] items { get; set; }

        public int? prev_page { get; set; }
        public int? next_page { get; set; }
        public int total_count { get; set; }
        public int page { get; set; }
        public int per_page { get; set; }
        public int max_per_page { get; set; }
    }

items is not correct name, I should change JSON property name by T.

So I've built custom Paginiation<T> formatter https://github.com/neuecc/EsaClient/blob/master/EsaClient/Json/PaginationItemFormatter.cs and there resolver. https://github.com/neuecc/EsaClient/blob/master/EsaClient/Json/EsaJsonFormatterResolver.cs

Is this example close to your use case?

niemyjski commented 6 years ago

Thanks for those, I'm going to start working on this very soon. I did notice that I'm having serious issues with interoperability with JSON.NET to the point I don't know if I can solve them right away. Here are the issues that pertain to #2

For example, In JSON.NET they serialize to a JToken if they don't know the object type. The too string of this is raw json.. However message pack can't serialize jtokens or anything without a public constructor... So while having utf8json and message pack isn't able to play nicely with nest which currently only supports json.net. One of the biggest problems currently is that when we do come across a known or unknown dictionary type it's a dictionary with a deep structure.. Do you have any helpers built in to convert these to an object or is that something I'll have to build?

Finally, I know you don't cache dictionary keys, but users can submit millions of different random objects to us in a data bag, I'm still a bit worried about the memoized cache getting blown out.

neuecc commented 6 years ago

When using JToken, it depends on JSON.NET anyway. It is natural that there is a necessity to modify the code. I do not know what is the problem.

unknown dictionary type it's a dictionary with a deep structure

What type do you mean? For example Dictionary<string, Dictionary<string, Dictionary<string, Foo>>> ?

neuecc commented 6 years ago

for example, bitflyer(Top BitCoin exchange in Japan) api client for .NET, https://github.com/kiyoaki/bitflyer-api-dotnet-client was replaced JSON.NET to Utf8Json. https://github.com/kiyoaki/bitflyer-api-dotnet-client/commit/3d824d5b9afc678031e01d95be6577d7f89f1322

niemyjski commented 6 years ago

For anonomopus dictionary types that are not known they are deserialized into a dictionary or list. I was wondering if there was a utility method that could be used to quickly convert that dictionary to a known type without saying Deserialize(Serialize(dict)). I think we may need to create a formatter for this as we have the nest client returning content with jtoken (in dictionaries) and we need to be able to round trip this and currently we cannot.

neuecc commented 6 years ago

If you are talking about https://github.com/exceptionless/Exceptionless/blob/master/src/Exceptionless.Core/Serialization/DataObjectConverter.cs#L84

  1. Craete T
  2. Read by JToken(dynamic)
  3. Set to T
  4. Return T
// L41. JSON.NET
var json = JObject.Load(reader);

// Utf8Json
var json = (Dictionary<string, object>)JsonSerializer.Deserialize<object>(ref reader);
foreach(...)
{
    var propertyName = .Key
    var value = .Value; // as JObject
}

However, I will not handle objects (Dictionary <string, object>) and will process them with JsonReader.