y-crdt / ydotnet

.NET bindings for yrs.
MIT License
29 stars 9 forks source link

Exception calling Get* on top level field without first defining it #85

Open akatakritos opened 5 months ago

akatakritos commented 5 months ago

Consider this unit test based on the example in the root README.md. Note that on remoteDoc I never call remoteDoc.Text("name") to define the field.

[Fact]
public void ReadingUndeclaredPropertyReturnsNull()
{
    Doc localDoc = new Doc();
    Text localText = localDoc.Text("name");

    Transaction localTransaction = localDoc.WriteTransaction();
    localText.Insert(localTransaction, 0, "Y-CRDT");
    localTransaction.Commit();

    // Set up the remote document.
    Doc remoteDoc = new Doc();
    // remoteDoc.Text("name"); 
    // ^^^^^^^^^^^^^^^^^^^^^^. NOTE: explicitly not declaring the name field

    // Get the remote document state vector.
    Transaction remoteTransaction = remoteDoc.WriteTransaction();
    byte[] remoteState = remoteTransaction.StateVectorV1();

    // Calculate the state diff between the local and the remote document.
    localTransaction = localDoc.ReadTransaction();
    byte[] stateDiff = localTransaction.StateDiffV1(remoteState);
    localTransaction.Commit();

    // Apply the state diff to synchronize the remote document with the local changes.
    TransactionUpdateResult result = remoteTransaction.ApplyV1(stateDiff);
    remoteTransaction.Commit();

   // at this point, remoteDoc should have the "name" field since it synced with localDoc

    using (remoteTransaction = remoteDoc.ReadTransaction())
    {
        Text? textProperty = remoteTransaction.GetText("name");
        // as a new user, I expected this to return the Text since the sync should have created
        // it.

        // Instead, the docs state it should be null: "Returns the Text at the Doc root level, identified by name, 
        // or null if no entry was defined under name before." However, it throws instead.
        Assert.Null(textProperty); 
    }
}

I had expected the call to remoteTransaction.GetText("name") to return the Text object since by virtue of syncing with localDocument the property does now exist. Barring that, I expected it to return null per the documentation of the GetText method: "Returns the Text at the Doc root level, identified by name, or null if no entry was defined under name before."

In reality, it threw an exception:

YDotNet.YDotNetException: Expected 'Text', got '0'.

YDotNet.YDotNetException
Expected 'Text', got '0'.
   at YDotNet.Document.Transactions.Transaction.GetWithKind(String name, BranchKind expectedKind)
   at YDotNet.Document.Transactions.Transaction.GetText(String name)
   at Ballpark.Tests.DocTests.ReadingUndeclaredPropertyReturnsNull() in /Users/mattburke/projects/ballpark/server/Ballpark.Tests/DocTests.cs:line 57
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

I would love to contribute an improvement if there's a direction you prefer to take!

SebastianStehle commented 5 months ago

I will have a look, but it looks like something went wrong with the transmit, because it could not find the proper type. Therefore 0

akatakritos commented 5 months ago

Thanks! I think it transmitted; if I define the name field with remoteDoc.Text("name") after applying updates (right before opening the last ReadTransaction in the test), the field and its value can be successfully retrieved. Its like you have to explicitly define fields even if they come in from a remote sync

SebastianStehle commented 5 months ago

Could you ask that in the ycrdt repo. I am actually not sure right now

akatakritos commented 5 months ago

Sure! I'll see if I can figure out a reproduction using straight rust. Thanks!

akatakritos commented 5 months ago

https://www.bartoszsypytkowski.com/yrs-architecture/

These types can be nested in each other according to their limitations. Unlike primitive values they can also be assigned to the document itself (using their unique names for retrieval) and obtained right from it. Shared objects created straight at the document level are called a root level types and have a very interesting characteristic - their actual type is only a suggestion on how to present their content

...

This structure also enables to reinterpret root-level types eg. you can read a Text as an Array (to change its materialized type to a list of characters) or XmlElement as a Map (and be able to read only its attributes). Just like with any other kind of weakly typed systems you should use these capabilities with caution.

My interpretation of this is that it's by design to get a 0 back from rust. Top level properties are flexibly typed, if you don't declare that it's an Array or whatever it doesn't know how to interpret it.

Maybe we should keep throwing here but perhaps special case the 0 and throw a message informing the user to declare the type before trying to read it?

Or add an enum element for Undeclared = 0? That might just be an implementation quirk though

Horusiath commented 5 months ago

@akatakritos from the update binary format perspective, the only information about root-level type passed is its name. The information about type of that structure is not passed as part of the update.

For root-level types, you're expected to create them after initialising the Document. The fact that this is not enforced behaviour is due to compatibility with Yjs API, which allows you to create them at any time for convenience, however it's advised to make it during Doc initialisation as well.

akatakritos commented 5 months ago

Thanks for the info! That's basically what I've come to understand. Maybe the only improvement this issue could represent is a slightly clearer exception message that guides the user towards declaring the field at Doc init time.

SebastianStehle commented 2 weeks ago

@LSViana What shall we do with this issue?

LSViana commented 2 weeks ago

@SebastianStehle I agree we could have a better exception message, and then we could apply that change and resolve the issue.

If we have a proper exception message, future users shouldn't have similar problems, and we won't need to change the code. What do you think?