[C-API] Expose a way for SDKs to set class/property keys when creating the schema

nirinchev commented 2 years ago

Describe your problem or use case

Currently we have an API to get collections by TableKey and properties by ColKey (exposed as realm_class_key_t and realm_property_key_t in the C API). The issue with this API is that those values are only available at runtime, so SDKs are forced to construct the schema, then query Core to map the TableKey/ColKey to our local data models. This is typically done by keeping dictionaries mapping Type->TableKey and string->ColKey, then in the property getters, we'll do something like:

class MyObject {
  static Map<String, int> properties = {};

  int foo {
    get {
      var colKey = properties['foo']
      return realmCore.getValue(this, colKey) as int;
    }
  }
}

A lot of that can be simplified (and in some cases optimized), if we were allowed to provide an "id" when constructing the schema. Then OS/Core would be responsible for mapping the id to the ObjectSchema, turning the code into something like:

class MyObject {
  int foo {
    get {
      // Here 1 is the generated id for this property and is a compile-time constant
      return realmCore.getValue(this, 1) as int;
    }
  }
}

I believe something like this may be achievable already for users of the C++ API as the persisted_properties and computed_properties vectors are populated based on the schema supplied by the SDK, so their indices should be in sync with whatever the SDK provided, but it only covers properties, not classes, and even then it's not available to the C API.

nirinchev commented 2 years ago

cc @nielsenko as this is something that came up in our conversations.

nicola-cab commented 2 years ago

I am not sure if I understand what you are requesting @nirinchev. I am not sure 1 or in general any index can/will be viewed as a compile time constant (at least not at core level), also what if there are changes to the schema, id=1 would not necessarily mean the same thing, or the same property. Especially because you don't have the whole OS schema exposed at SDK level. Are you looking to have a way to retrieve classes and properties based on a index rather than based on ColKey?

nirinchev commented 2 years ago

Yes, those could also be viewed as indexes. Essentially, it's a way for SDKs to deterministically assign an id/index to tables and columns, the same way they assign names, then use those ids everywhere where a TableKey/ColKey can be used. The ids themselves don't need to be persisted in the schema and can change every time the app is launched, but should be stable for the lifetime of the app.

nicola-cab commented 2 years ago

OK. Still no so sure if this is a good idea. Because I am not sure what would happen if a second app or process opens the same realm file and migrates the schema. Using a name, most probably, can get the client code out of trouble, since in the worst case scenario, that property won't longer be found. But using an index, can allow you to mix properties or classes, or retrieve something that has different data type than what the user expects.

tgoyne commented 2 years ago

I don't think SDKs setting the actual ColKey and TableKeys used in core is viable. It could work in very simple cases, but as soon as migrations, sync, or multiple processes are involved it would be extremely complicated.

The C API appears to already expose what you need to do this in the same way as is done with the C++ API. realm_get_property_keys() sets the keys in the same order as the properties that you passed in, so storing that array somewhere and then doing return realmCore.getValue(this, indexToColKey[1]) as int; is perfectly valid.

There's no equivalent for Tables, but relative to the cost of constructing a Results the lookup is pretty cheap. If you have a different public API where that's not true then it could be worth adding something.

nirinchev commented 2 years ago

Okay, I realize I didn't do a particularly good job at explaining the request. Let me try to clarify a few things and provide a code example. So, here I'm not talking about the SDK supplying fixed values for TableKey/ColKey or even touching the Core schema at all. I'm suggesting all of this to be handled at OS/C API. Also, originally I was talking about ids to avoid being prescriptive, but let's look at an example where we have indices instead.

// SDK creates the schema; pseudocode
final classes = realm_class_info_t[]; // a vector of realm_class_info_t
final properties = realm_property_info_t[][]; // a vector of vectors of realm_property_info_t

classes[0] = realm_class_info_t { name = "Person", ... };
properties[0][0] = realm_property_info_t { name = "firstName", ... };

// We create the OS schema providing the vectors above
final schemaPtr = realm_schema_new(classes, 1, properties);

If OS/Core can guarantee that the order of the classes/properties in the schema is preserved (which I believe it does), then we could write an API that roughly looks like this:

RLM_API bool realm_get_value_by_property_index(const realm_object_t* obj, size_t prop_index, realm_value_t* out_value)
{
    return wrap_err([&]() {
        obj->verify_attached();

        auto col_key = object.get_object_schema().persisted_properties[prop_index].column_key;

        auto o = obj->obj();

        auto val = o.get_any(col_key);
        auto converted = objkey_to_typed_link(val, col_key, *o.get_table());
        out_value = to_capi(converted);
    }
}

RLM_API realm_object_t* realm_get_object_by_class_index(const realm_t* realm, size_t class_index, realm_object_key_t obj_key)
{
    return wrap_err([&]() {
        auto& shared_realm = *realm;
        auto table_key = shared_realm->schema[class_index].table_key;
        auto table = shared_realm->read_group().get_table(table_key);
        auto obj = table->get_object(ObjKey(obj_key));
        auto object = Object{shared_realm, std::move(obj)};
        return new realm_object_t{std::move(object)};
    });
}

Because the SDK is in control of the order of the classes and class_properties vectors when calling realm_schema_new, it can lay them out at compile time, so that we know that the Person is the first element of the classes vector and Person.firstName is the first element in the first element of the class_properties vector. Then we can do something like:

class Realm {
  T getByObjectKey<T extends RealmObject>(ObjectKey key) {
    final classIndex = T.schema.classIndex;
    realmCore.getByClassIndex(this, classIndex, key);
  }
}

class Person implements RealmObject {
  static ObjectSchema schema {
    return ObjectSchema(name: 'Person', classIndex: 0, properties: [
      Property(name: 'firstName', ...)
    ];
  }

  String firstName => realmCore.getPropertyByIndex(this, 0) as String;
}

tgoyne commented 2 years ago

The order of the classes is not preserved; we sort by class name in Schema so that we can binary search it when looking up by name. I'm not sure when you'd ever need realm_get_object_by_class_index() though. How are you getting an object key without also having the table key? When following links or reading from a collection the thing you're reading from knows the target table (unless it's a Mixed field, in which case you have a TypedLink and obviously have a table key).

realm_get_value_by_property_index() is perfectly reasonable and is basically what Cocoa does.

nirinchev commented 2 years ago

Yeah, realm_get_object_by_class_index was just a random API I picked from realm.h to illustrate the point - I don't believe we're using it. In any case, if the object schema order is not preserved, then that approach will not work for object schemas - it's a fairly centralized codepath (at least in the dart/.NET SDKs), so we can just keep a Map<Type, TableKey> (or similar) and look up the table key from the type.

Still, get_value_by_property_index is going to be quite valuable as it will allow us to const-construct the schema for our models and remove a dictionary lookup on every property access.

blagoev commented 2 years ago

and remove a dictionary lookup on every property access Why is this a problem? What we are trying to optimize for. Have we identified this is really a performance bottleneck in any of the SDKs?

nicola-cab commented 2 years ago

I am still not convinced that we can do that safely for properties… If this PR goes in: https://github.com/realm/realm-core/pull/5784/files ... the unmatching persisting properties will be appended to the end of the array. The meaning of any index could change at runtime if there is migration or sync kicks in... @nirinchev and @tgoyne

nirinchev commented 2 years ago

@nicola-cab this should not be a problem - appending properties beyond the schema known by the SDK will not affect its ability to access the properties it knows about by index. Local migrations should also be handled correctly as the schema of the pre-migration Realm is fully dynamic from the SDKs perspective and is accessed via the dynamic API. The post-migration schema should order properties correctly as that's the schema the Realm instance ends with.

@blagoev this is more of an ergonomics improvement than a performance optimization. Right now, after you provide the schema to Core, you need to ask it again to be able to populate the table and property keys. Apart from this being extremely inefficiently designed in the C API, where you have to construct the schema 1 by 1, then ask for each property 1 by 1, resulting in hundreds of calls to Core just to open a Realm, it's still wasteful. We're essentially doing double bookkeeping here, where Core already stores a schema, but we store one as well to be able to lookup information we already have.

nicola-cab commented 2 years ago

OK @nirinchev, I can add what it is needed to the C API.

sync-by-unito[bot] commented 2 years ago

➤ Nikola Irinchev commented:

To be clear, this is not urgent by any means, I believe it's going to be a nice optimization but is definitely not blocking us at the moment.

nirinchev commented 2 years ago

Hm... perhaps the issue wasn't explicit enough or the example I gave was misleading, but the request is not only for "get property by index". For this to work, we need to be able to completely replace all API usage where realm_property_key_t is used with API that take in property indices instead. This is realm_set_value, realm_set_embedded, realm_get_linked_object, realm_object_changes_get_modified_properties, and others. Additionally, API that take realm_property_key_t as part of a more complex struct should also allow overloading - e.g. realm_results_add_notification_callback, realm_object_add_notification_callback, etc.

Essentially, we would like the SDK not to have the concept of realm_property_key_t. It may be fairly cumbersome to expose duplicate API so we may want to generalize that somehow - since realm_property_key_t is a typedef for int64_t, which can also represent an index, we could add a helper function like:

ColKey get_col_key(int64_t value, ObjectSchema& schema) {
  #if SDK_USES_PROPERTY_INDEX
  return schema[value].column_key;
  #else
  return ColKey(value);
  #endif
}

And then SDKs that want to opt in for property index usage can build with SDK_USES_PROPERTY_INDEX = true.

nicola-cab commented 2 years ago

OK, this was not clear to me. It is a much bigger undertake.

blagoev commented 2 years ago

I find this problematic

class MyObject {
  int foo {
    get {
      // Here 1 is the generated id for this property and is a compile-time constant
      return realmCore.getValue(this, 1) as int;
    }
  }
}

How can you guarantee that each time the generator will generate the exact same value for this property. The first example ((listed below) ) will always work regardless of the order of the properties while this one seems order dependent.

class MyObject {
  static Map<String, int> properties = {};

  int foo {
    get {
      var colKey = properties['foo']
      return realmCore.getValue(this, colKey) as int;
    }
  }
}

Meaning people can break their apps by just reordering properties and reading an old realm file.

nirinchev commented 2 years ago

The property order is not persisted in the schema, so it can change every time you restart the app. The schema is created and maintained at runtime for each Realm instance independently. So you can even run multiple apps accessing the same Realm file that have different schema orders.

blagoev commented 2 years ago

Then I guess the sample code is misleading. It should be

return realmCore.getValue(this, 'foo', 1) as int;

or maybe the name to id mapping is set beforehand.

nirinchev commented 2 years ago

The property-index mapping can be resolved at compile time. It can change as the models change or the code is regenerated, but we don't need the name.

Think of an example of our generated code for the schema:

final Object schema schema = [
  Property('foo', 'int'...)
]

We know that foo is at index 0 because it's the first element in the schema array. That will not change until the model is recompiled, so we can just use 0 everywhere where we would use foo.

blagoev commented 2 years ago

Yes indeed the mapping should be done beforehand. then it should work.

realm / realm-core

[C-API] Expose a way for SDKs to set class/property keys when creating the schema #5874

Describe your problem or use case