mfenniak / rethinkdb-net

A C# / .NET client driver for RethinkDB.
Other
247 stars 37 forks source link

Serialization and DataMemberContract #149

Closed bchavez closed 10 years ago

bchavez commented 10 years ago

Hi there,

First, thank you so much for all your work on this C# RethinkDB driver.

You write very clean code. I was able to examine the source code and figure it out. Your code was wonderfully easy to comprehend. Thank you for doing such a great job!

Now, some issues/suggestions/solutions/feedback:

Serialization

Conceptually, in my view, the restriction to explicitly mark every [DataMember] and every [DataContract] for each aggregate/domain object and every field is quite cumbersome. I understand it's a WCF concept, indeed, even the MVC crew in its early days required to explicitly mark every attribute, data contract, and HTTP action.

Eventually, MVC & WebAPI teams adopted the convention that everything was "on / opt-in" by default without having to explicitly mark every data contract, HTTP action, etc. When a developer explicitly sets up Attributes it usually conveys the notion of an "explicit override".

JSON.NET (Newtonsoft.Json) serializer, follows the same convention:

By default (without any attribute decorations), an object serialized by JsonConvert.Serialize() all public members are selected for serialization.

This becomes really important when you don't have mutable access to object model definitions that exist inside the .NET framework. For example, suppose you had the following object model for ASP.NET's new User Identity framework:

    [DataContract}
    public class Account : Document<string>, IUser
    {
        public Account()
        {
            this.Logins = new List<UserLoginInfo>();
        }

        [DataMember]
        public string PasswordHash { get; set; }

        [DataMember]
        public string UserName { get; set; }

        [DataMember]
        public string SecurityStamp { get; set; }

        [DataMember]
        public List<UserLoginInfo> Logins { get; set; }
    }

    //Microsoft.AspNet.Identity.Core.dll
    public sealed class UserLoginInfo
    {
        public string LoginProvider { get; set; }
        public string ProviderKey { get; set; }
    }

Using [DataContract] is great, up until the point where you need to serialize User.Logins property which is of type UserLoginInfo. UserLoginInfo resides inside Microsoft.AspNet.Identity.Core.dll and it is not decorated with [DataContract]. Worse yet, UserLoginInfo is sealed. This leaves me in a corner. I want to use RethinkDB serialization, but I can't decorate UserLoginInfo with the required [DataContract].

This is why I think following the convention that all public properties are "opt-in" by default makes more sense. [DataMember] should be thought of as supplementing the serialization process, not a requirement for serialization. IMHO.

Also, I'm not really a fan of decorating everything with [DataContract] and [DataMember]; since it only adds extra code to my domain models.

JSON.NET (Newtonsoft.Json) Datum Conversion

So, realizing this limitation, I set out to write my own custom serializer that leveraged the popular Newtonsoft.Json serializer. Basically, I wrote my own custom DatumReader and DatumWriter for Json.NET. In essence, instead of reading and writing JSON text, the Json.NET serializer reads and writes "datums" for RethinkDB.

The primary benefit, is I get to use all the serialization facilities, capabilities and customization that come with Json.NET http://james.newtonking.com/json.

Summary

I'd be more than happy to contribute my DatumReader and DatumWriter code to the RethinkDB C# driver project, but there are a few points:

Cons:

Pros:

Perhaps the Json.NET serializer code could live in a separate assembly for those who want it. I don't know, but I thought maybe you might be interested.

Thanks, Brian

karlgrz commented 10 years ago

@bchavez,

Let me be the first to say thank you for this great write up!

I agree with a lot of this. Performance is a concern, but the trade off is worth it, I think.

I would love to see this merged in, and I like your idea of putting it into a separate assembly, since the JSON.net dependency might not be desired for every application.

Please submit a pull request, I would really like to see this functionality myself :-)

mfenniak commented 10 years ago

Hi Brian,

I'd be really happy to see an alternative to the DataContractDatumConverter. As it adds a new dependency on the Newtonsoft.Json library, I think it would be a good idea to isolate this dependency into new assembly that can be optionally used by consumers. As Karl suggests, creating a pull-request would be the best approach to getting this work incorporated. We'll take a look at your implementation, make a few code review comments, and then merge it in with thanks. :-)

I am a little confused by one thing in your write-up. You're referencing DatumReader and DatumWriter, but this project doesn't have anything named that. We have a single interface, IDatumConverter. Does your incorporation of Newtonsoft.Json change that in some way?

I've filed an issue, #150, to allow configuration-based selection of a connection's IDatumConverterFactory. This isn't directly related to you adding the Newtonsoft-based implementation, but it comes to mind as something that would be helpful if we have a couple different approaches to the serialization in the future.

bchavez commented 10 years ago

Hi,

Thanks for the feedback. I'll prepare my code. Please give me a few days or so ... Currently, I'm cleaning up a few things and writing some unit tests to go along with it.

@mfenniak Sorry for the confusion. The DatumReader and DatumWriter are concepts pulled from Newtonsoft.Json. *Reader and *Writers are analogous to Bson/JsonReader and Bson/JsonWriter. Essentially, I've just extended the capabilities of Newtonsoft.Json to understand how to read and write Datum object graphs from the protobuf spec.

I don't think my implementation of DatumReader and DatumWriter changes IDatumConverter. Conceptually, I think, my NewtonsoftDatumConverter is a composition that includes both DatumReader and DatumWriter, so I don't think it changes how I plug into IDatumConverter. However, I've only briefly analyzed the driver's codebase and determined it was "possible" to unplug the [DataContract] code and replace it with mine. My guess is, I'll somehow remove the DataContractDatumConverter and replace it with mine when the connection is being built.

There are parts of my DatumReader and DatumWriter that utilize the driver's primitive DatumConverters. For example, for those types that need further specialized normalization like DateTimes and DateTimeOffsets ($reql_type$:TIME); I simply call the driver's native implementation to get the Datum structure when Newtonsoft tries to resolve a DateTime|Offset. My main concern was targeting the serialization aspects of [DataContract] and leveraging the flexibility of Newtonsoft.Json serializer engine.

I hope that clarifies things!

Thanks, Brian

bchavez commented 10 years ago

Hello.

I have a quick question about TimeSpan. How should TimeSpan types be handled by the driver? Is there a native $resql_type$ for TimeSpans types?

Using only the native driver, objects with TimeSpan fields cause the driver to throw:

System.NotSupportedException : Datum converter is not available for type System.TimeSpan

I see some native DateTime conversions are made to long ticks ...

I've programmed my Newtonsoft datum converter to convert TimeSpans to/from long ticks. Please let me know if this is okay.

Thanks, Brian

karlgrz commented 10 years ago

Ticks should be fine. Take a look at https://github.com/mfenniak/rethinkdb-net/blob/master/rethinkdb-net/DatumConverters/DateTimeOffsetDatumConverterFactory.cs#L94 which does something similar.

mfenniak commented 10 years ago

Hm, ticks is alright. I think that floating-point seconds would be better, as that's the same type used for the epoch_time field in the RethinkDB TIME type.

bchavez commented 10 years ago

Hi,

Thank you so much for the responsive help.

@mfenniak Great. So, something like this would be okay then:

var x = new TimeSpan(days: 1, hours: 2, minutes: 3, seconds: 4, milliseconds: 5);
x.TotalSeconds.Dump();
// 93784.005 floating-point seconds?

var y = TimeSpan.FromSeconds(x);
y.Dump();
//1.02:03:04.0050000

Ultimately, I was worried about of some kind of ReQL similar to:

object.Date + object.Timespan

and having RethinkDB maintain the semantic consistency of adding a Date and a TimeSpan.

mfenniak commented 10 years ago

Yup, .TotalSeconds and .FromSeconds seem appropriate to me. As for adding a Date and a TimeSpan, the ReSQL documentation does mention that "time.add(number) -> time", which suggests that we could add an expression mapping to support Date + TimeSpan (I haven't tested what the "number" is though; probably seconds too?). I'd consider adding that a separate issue though that we can look at once your patch is brought in; I'd like to keep your JSON.net patch as minimal as possible so that it's easier to review.

mfenniak commented 10 years ago

Closing w/ the merging of PR #151. :-)