realm / realm-cpp

Realm C++
Apache License 2.0
71 stars 16 forks source link

Design: Reflection / Object Mapping #1

Open simonask opened 8 years ago

simonask commented 8 years ago

There are a couple of considerations that need to be taken into account, some of which apply to other bindings as well, and some of which are unique to the C++ binding.

The problem we need to solve is this: How do we give users type-safe and convenient access to objects stored in Realm?

To answer that question, there are several subproblems that need to be answered:

These decisions should be informed by the imagined use cases for Realm C++, which are slightly more diverse given the nature of the apps in which people choose to make use of C++. Realm C++ might have to coexist with existing, diverging implementations of things like reflection (for example, game engines generally have this in some form), and users of Realm C++ are very likely to already have strong opinions about how their memory should be managed.

In the following, I am mostly concerned with the user experience that we provide, and not so much about what our internal reflection APIs would look like. For the relevant scenarios, presume an API that looks similar to C++'s own type_info set of functions, but with added information about struct fields.

Reflection Proposal 1: Macro-based structs

An nearly-sufficient implementation of this already exists in Core in src/realm/table_macros.hpp.

Pros:

However, this approach has several severe drawbacks:

Example:

REALM_TABLE_4(MyTable,
    my_integer, int,
    my_string, std::string,
    my_link, Link<MyOtherTable>,
    ...
);

Reflection Proposal 2: Qt-like Meta Object Compiler

This would go in a different direction, where users would be able to define their classes like normally, and then add a minimal amount of annotation. We would then ship a tool that parses the header files defining these objects and outputs some C++ code providing the guts of the reflection machinery.

A base class realm::Object containing the row accessor is presumed.

Pros:

Cons:

Example:

class MyObject: realm::Object {
public:
    int m_integer REALM_PROPERTY("my_integer" REALM_PRIMARY_KEY);
    std::string my_string REALM_PROPERTY(REALM_DEFAULT_VALUE("foo"));
    realm::Link<MyOtherObject> m_link REALM_PROPERTY();
};

Reflection Proposal 3: Pure Template Based

It is possible to achieve something fairly elegant using only standard C++ primitives (no macros), but it does require a small amount of boilerplate. The upside is that it makes it easy to map existing classes and objects to Realm.

Pros:

Cons:

Example:

class MyObject: realm::Object {
    int m_integer;
    std::string m_string;
    realm::Link<MyOtherObject> m_link;

    static void reflect(realm::Reflector<MyObject>& r)
    {
        r.property(&MyObject::m_integer, "my_integer").primary_key();
        r.property(&MyObject::m_string, "my_string").default_value("foo");
        r.property(&MyObject::m_link, "my_link");
    } 
};

Proposal: Achieving Zero-Copy Semantics

In general, the problem is that we can't reliably intercept get/set operations on fields in an unobtrusive way, since C++ has no such thing as "properties" in other languages. We could introduce a special Property<T> type that can be used to solve this.

The class Property<T> would be defined as such:

template<class T>
struct Property {
    size_t offset_in_object;

    operator T() const;
    Property<T>& operator=(T&&);
};

When getting and setting the value of the property, we would use the offset_in_object to find the beginning of the encapsulating object ((realm::Object*)((char*)this - offset_in_object)), which would allow us to find both the (dynamic) type of the encapsulating object as well as which property is being accessed, so that we can issue the appropriate function calls into Core.

The realm::Link property implicitly already has the same functionality, so doesn't need to be wrapped in realm::Property.

(This technique would also work in the MOC proposal.)

Pros:

Cons:

Example:

class MyObject: realm::Object {
    realm::Property<int> m_integer;
    realm::Property<std::string> m_string;
    realm::Link<MyOtherObject> m_link;

    static void reflect(realm::Reflector<MyObject>& r)
    {
        r.property(&MyObject::m_integer, "my_integer").primary_key();
        r.property(&MyObject::m_string, "my_string").default_value("foo");
        r.property(&MyObject::m_link, "my_link");
    } 
};

Proposal: Multiple Inheritance

I propose that we do not support it. :-)

astigsen commented 8 years ago

When it comes to the macro based approach, I am not sure the drawbacks are entirely correct:

Table/row metaphor (rather than "object" oriented).

That is just leftovers of the old terminology. There should be no reason that it could not be made as fully "object" oriented as any other method.

No high-level features (such as what's provided by Object Store).

I am also pretty sure that we could find ways to support features like links, primary keys and other annotations within the framework of macros.

For me the drawbacks are more the unfamiliar syntax (and lack of auto-completion to help you remember it), and the horrible experience of stepping into the macros when debugging (but the same could potentially be said about template based approaches).

astigsen commented 8 years ago

Another huge drawback with the macro based approach is that it does not allow users to add their own methods to the objects, forcing them into an Anemic Domain Model :-(

simonask commented 8 years ago

@astigsen Good points — I will admit that I'm subconsciously almost writing off the macro approach at this point, and perhaps I'm therefore not doing justice to its merits. It still seems to me that it's hard to achieve an "object-oriented" model where users can do things they might want (such as define methods on Realm objects, as you mention).

The Java binding has suffered from the "Anemic Domain Model" for a long while (great name by the way!), and it has been a huge source of frustration for their users, so that should be taken into consideration as well.

kristiandupont commented 8 years ago

I think that the reason I dislike macros is a very simple feeling of "now, my code would not be processable by a tool". Of course, a very sophisticated tool would still be able to, but at least the feeling that I am not relying on the pure syntax of the language gives me that feeling. Given that, the Property<T> approach looks most promising to me.

Regarding anemic models, I have to say I was quite astonished to see Fowler discouraging the pattern, because I feel that there is a trend towards it, at least in enterprise .NET but also more generally as it looks more like what you would do in a functional language. Rich objects tend to be hard to test and in general look like little programs with global variables. I wonder if he would still agree with his assertions today. I realize, though, that a lot of people work this way and expect to be allowed to do so, so blocking them is a problem.

I wonder i C++ developers are so opinionated about their architecture that they would be happier with a library rather than a framework -- a bunch of functions that they could call from whatever architecture they already have in place, rather than being forced to inherit from something when they already have all of their classes derive from some Actor class or whatever? I know that it would be less magic than the other bindings, but then, C++ developers are a special breed anyway...

AndyDentFree commented 8 years ago

I tackled all these issues in OOFILE with similar debates (the arguments were identical 20 years ago). I wrote up a bit of a comparison on our wiki possibly worth looking at least at the model bit.

My solution to the property issue was to use my own persistent base classes (eg: dbInt) so that I could use operator overloading. That's not as obnoxious to people as you might think if they are very lightweight classes seen as just an API to the storage - people don't necessarily expect real integers when they are persistent.

From observation of other products, yes, I think having a parsing tool like Qt can be a nightmare for maintenance. In particular, parsing binary formats is a massive overhead (I think Poet did this and was always annoying people by being behind compilers).

I think @kristiandupont makes a good argument about people maybe being happier with a library than a framework. One of the things which worked really well for some of the diverse OOFILE users was exactly that - you could use generic calls to get, set, search etc. without using the convenience of the class declarations which gave you type-checked operations.

tgoyne commented 8 years ago

Some various low-boilerplate options that don't require reflection:

// tuple-style with unnamed fields
class MyObject : public realm::Object<int, float> {
};

MyObject obj = ...;
get<0>(obj) = 5;
float value = get<float>(obj); // would be a compile error if there are multiple float properties
// named tuple using externally-defined property types
REALM_PROPERTY(int, foo);
REALM_PROPERTY(float, bar);

class MyObject : public realm::Object<foo, bar> {
};

MyObject obj = ...;
obj[foo()] = 5;
float value = obj[bar()];
// named tuple using constexpr string hashing
class MyObject : public realm::Object<prop(int, "foo"), prop("bar") = 1.5f> {
};
MyObject obj = ...;
int value = obj[prop("foo")];
// macro to generate the properties rather than the entire class
class MyObject : public realm::Object {
    REALM_PROPERTIES(
      int, foo,
      float, bar)
};

MyObject obj = ...;
int value = obj.get_foo();
simonask commented 8 years ago

Great to see some new suggestions!

@tgoyne As far as I can tell, your second suggestion can also support a statically checked interface, along these lines:

REALM_PROPERTY(int, foo);

class MyObject: public realm::Object<foo> {
};

MyObject obj = ...;
get<foo>(obj) = 5;

I also think something like aliasing a field (assigning different field names in code and in Realm) will be a very common thing for people to do in C++, because most companies use some kind of prefix or suffix for data members that they will not want to put inside the Realm, particularly if they're syncing it to other bindings. I haven't really seen any constexpr string handling that worked across all major compilers without very awkward syntax, but my knowledge could be outdated - do you have any info on this? I suppose at least C++14 is required.

tgoyne commented 8 years ago

All of the ideas I threw out would give a fully statically typed interface. In the case of the second, there'd be a different overload of operator[] for each of the the properties, and the actual passed-in value wouldn't be used for anything.

AndyDentFree commented 8 years ago

Just to clarify how the parent property approach works that I use in OOFILE and the pattern can be used here, class declarations look like simple class decls with our own types instead of native and then you don't have to use property syntax to access them. This uses pure compile-time logic working on any compiler and very conservative C++ (even though we can mandate C++11 at least).

The secret is to use (pre-thread) registration by constructor - at the time of constructing a rInt we are inside the constructor of a realm::Object so can connect the two to build up the schema.

// properties using special Realm classes
class MyObject : public realm::Object {
      rInt foo;
      rFloat bar;
};

MyObject obj = ...;
int value = obj.foo;
astigsen commented 8 years ago

The secret is to use (pre-thread) registration by constructor - at the time of constructing a rInt we are inside the constructor of a realm::Object so can connect the two to build up the schema.

That is actually an interesting technique to collect the information needed for introspection. It does seem very timing dependent though. Are we sure that it won't be possible to end up with some corrupted state?

Even if it works, you still need the property names. Maybe it would be an idea to combine it with a simple macro:

class MyObject : public realm::Object {
      REALM_PROPERTY(int, foo)
      REALM_PROPERTY(float, bar)
};

Looks ugly compared to the above method of using special property types, but might be hard to get that information otherwise?

AndyDentFree commented 8 years ago

If you are stashing the information with thread-local stuff it cannot be timing dependent because your property construction is guaranteed to be occurring immediately after that particular class's base realm:Object constructor. Base and member constructors can't be interweaved in a thread.

The current OOFILE doesn't use thread-local storage but this was never a problem in extremely broad use (19 different compilers, most countries, hundreds of thousands of end users).

You don't need property names ever if you never have arguments needing string names as you can manufacture column names. That's how OOFILE supports people generating dynamic schemas. However yes sometimes it's useful to pass in a name or other settings such as indexed attributes. (Also if you have an old-fashioned backing store you may need character field widths!).

Because these are now classes rather than raw ints you can easily have C++ init lists on them. I need to test what modern syntax would allow but I think something like this would work (qualified answer with too much C# for months):

// properties using special Realm classes
class MyObject : public realm::Object {
      rInt foo {"Foo", 0};
      rFloat bar {"Bar", 42.0};
};
simonask commented 8 years ago

Also keep in mind that we need to match column names in cross-binding usage scenarios (such as sync). :-)

This also means that things like the table name must be customizable in some way.

Putting the macro inside the class, like

class MyObject: public realm::Object {
    REALM_PROPERTY(int, foo);
};

doesn't leave us any opportunity to enumerate the properties of an object before having an actual instance of the object. Perhaps that might be good enough.

@AndyDentFree I'm curious what the exact technique is that you used to connect the field to the object in the member constructor. Do you let each constructor modify global/thread-local information? How do you know when all members have been enumerated?

simonask commented 8 years ago

Perhaps an approach like what V8 does can be employed, where adding a new property causes a new "type" to be generated, and each type contains a map of which new properties lead to which new types. This is particularly well suited for prototype-based languages like JavaScript.

To expand, the way this would work is that every time a realm::Object is instantiated it starts with the "unit type". When a property is detected because a realm::Property member is being constructed, the offset and type of that property is looked up in an internal mapping inside the current type info of the object to see if it already knows about a different type that corresponds exactly to its own members plus the one that is being added. If it doesn't exist, that new type is created. If it does exist, the object has its type changed. And so forth. (In addition to members, other distinguishing features like the table name and the primary key column would have to have the same effect.)

This would either need to be completely thread-safe, unless we are fine with objects created on different threads having duplicate runtime type information.

finnschiermer commented 8 years ago

@AndyDentFree The technique of using thread local storage to have a pointer to the base class so that it can be obtained secretly by the constructor for a property may break down, when the initialization of that property can in itself imply initialization of further objects. Something I guess could be a reasonable pattern if the property was a form of link. Or did I misunderstand your approach?

AndyDentFree commented 8 years ago

@finnschiermer

may break down, when the initialization of that property can in itself imply initialization of further objects.

OOFILE doesn't support nesting objects which I think you're pointing out would be a failure of this paradigm. (I suspect the paradigm would still work with some kind of stack and level count but would be a lot more fragile.)

It manages relationships with special properties for managing links similar to Realm. That included ownership so we had cascading deletes with an opt-out mechanism.

simonask commented 8 years ago

I wrote up a small ~100 line proof-of-concept in C++14. Could you read through it and let me know if it more or less matches what you had in mind? (see line 121 for an example of what the object definition would look like to the user) @AndyDentFree

Gist here: https://gist.github.com/simonask/2f22c00437f1fd5161dc

The relevant part for people just watching:

class MyObject: public realm::Object {
    realm::Property<int64_t> m_integer = property("my_integer");
    realm::Property<std::string> m_string;
};

I have to say, it looks amazing and very "magical" -- with zero macros! I like that! However, it also imposes some runtime overhead on object creation, and I'm wondering how much we could do to eliminate that. My gist makes zero attempts to reducing runtime overhead, and has one lookup in an std::map per property per object instance. Observing that 99.9% of all objects will follow identical type transitions, this can probably be optimized to be almost unnoticeable.

AndyDentFree commented 8 years ago

A few perspective thoughts rather than thinking about syntax:

  1. In all the other bindings we're modelling proxy objects and the same is true here - these are not POCO but are a combination of schema definition and data access. But, this is C++. Is there a different expectation here? People live with a managed environment having to carry extra overhead knowing there's a C++ core behind. Do they expect C++ to be leaner?
  2. I have some vague ideas about how to bypass runtime maps, partly because I'm thinking about them for C# but there are maybe more things we can do in C++ templates.
  3. OOFILE was designed before templates but was designed with a lot of consultation. It is collection-oriented with a flyweight pattern for several reasons. Avoiding per-object overhead on instantiation was a strong part of this. I think in the modern world of more functional programming maybe thinking in terms of collections of objects over individual objects is still a valuable perspective. Iteration is then a matter of which row is mapped to the accessor rather than many objects. This pattern scales very well.

note: I don't have time (very tight dotnet guideline) to be too distracted by this stuff so would rather think about it more and reply next week. Lack of more replies does not denote lack of interest! I want to go play with some template ideas when we have the RC featureset for dotnet delivered.

karagraysen commented 7 years ago

This is a very fascinating discussion! 💯 :woman_technologist: