y-crdt / ydotnet

.NET bindings for yrs.
MIT License
35 stars 9 forks source link

Simple setup #44

Closed SebastianStehle closed 10 months ago

SebastianStehle commented 11 months ago

This is cleaned PR now that contains the changes that are necessary to get YDotnet running on all platforms and to have native binaries. So here is what I did:

Binaries

  1. I created one project for each platform that contains the native binaries and a meta project that just references all other platforms if you create a multi platform server project (or client).
  2. The binaries are built once from my fork of y-crdt. The only difference to your fork is a single error message that I improved, because I was not able to find another bug.
  3. The binaries are built when a tag is created and then the normal built just pulls them from this separate workflow. Therefore the normal build should be fast.
  4. The binaries cover Linux, Mac, Windows, but no 32 bit systems and no ARM yet. I think it should be a good enough start.
  5. DllImport adds file extensions automatically based on the operation system (read https://learn.microsoft.com/en-us/dotnet/standard/native-interop/native-library-loading). Therefore I have removed the conditional building for the file name

Closes #15

Fixes to Output

The output type was very unexpected because to be close to y-crdt it was returning null when something was not as expected. All similar types (think JToken or JsonElement) throw exceptions and provide a property to check the type. Therefore I made the following changes:

  1. Introduced a new OutputInputType enum that reflects the tag integer from y-crdt.
  2. I exposed this enum so that you can check the type. Tag is not a well known term in the .NET world, so I changed it to Type. This improves performance because you don't need 15 calls to y-crdt to check your type. Also some methods like Null or Undefined don't have a check anymore.
  3. I throw an exception when the type does not match. Therefore it also simplifies all tests because you don't have the null checks anymore.

Btw: This makes #39 obsolete.

Bug in the Binding 1:

The method ymap_insert accepts a pointer to the input. We have passed the struct by value which was working under windows, but under mac and linux. I have no idea why and it took my hours to figure that out. But after changing it to a pointer it works fine now. See: https://github.com/y-crdt/y-crdt/blob/bb26eb73ab4ac6899ad3b6f29f9c9be281ef4a33/yffi/src/lib.rs#L1561

I think there could be room for improvement. With unsafe code we should be able to get the pointer to a struct and therefore avoid the extra memory allocation that is needed here to create a pointer.

Bug in the Binding 2

Under mac I have seen exceptions when working with the AfterTransactionEventNative struct. The event is passed as a pointer. Sometimes (See undo manager) it is correct, but in other cases you have to read the pointer to a struct manually. Not sure again what the problem is.

SebastianStehle commented 11 months ago

@LSViana Do you know when you have time to review that? I would really like to integrate yjs into our product and the .NET server is essential. So I am little bit in a hurry and have the time to move this forward. I also understand that you don't want to loose control and I don't want to maintain my own fork, because the world does not need 2 .NET yjs servers. So I am a little unsure how to move forward.

LSViana commented 11 months ago

Hi, @SebastianStehle!

I'll check your fork later today, I planned it for yesterday but I couldn't do it.

However, I believe it's unlikely I'll be able to review everything you added to this PR and this project doesn't have a NuGet package yet anyway, so I'd suggest you to keep using your own fork for the next few days just in case.

SebastianStehle commented 11 months ago

Thanks for the update. The point with the own fork is more that if I just continue with my plans, it causes several problems:

  1. It is more likely that you don't like them.
  2. It will be too much to review.
  3. It will be hard to make smaller, focused PRs.

So if I just continue for a few days it is very likely that there will be no merge at all.

LSViana commented 11 months ago

Hi, @SebastianStehle!

Don't worry that much about my previous comments. I'm the author of the bindings so far but this project won't run based on what I like, it should be the best for the community.

I was trying to make sure we'd not lose control because if I merge all your code today and I don't understand what it does and how it does, that would mean losing control of the library development in my opinion.

But, at the same time, I see now that you "cleaned up" the PR and it's now mostly changing files instead of adding new stuff (like the MongoDB and Redis-related code). This will definitely make things easier to review and integrate as soon as possible.

SebastianStehle commented 11 months ago

Yes, I want to make small changes for now and fix the core and then focus on the server projects. So what I would like to achieve:

  1. Improve nullability.
  2. Get rid of all warnings.
  3. Improve handling of unmanaged memory.

And when this is done, then I can provide new PRs for all the server stuff.

LSViana commented 11 months ago

Hi, @SebastianStehle!

Alright, let's do it. I'll review your changes later today after I finish my regular work & personal tasks, so you should probably expect my replies & review comments tomorrow.

LSViana commented 11 months ago

Hi, @SebastianStehle!

I am still going to check the code changes in this PR but I just finished reading your first comment and I have a few questions to ask beforehand:

The output type was very unexpected because to be close to y-crdt it was returning null when something was not as expected.

This is actually an expected behavior. This is inspired by how both Yjs and Yrs work.

In Yjs if you try to get a value from a Map, you'll receive any, then you have to do the type-checking (or casting) manually.

In Yrs, if you try to get a value from a Map, you'll receive a Value, then you also have to check for the actual underlying value before using it.

So, in short, I don't see a reason to throw an exception if the type being read doesn't match the underlying type. The approach of yffi of returning null pointers to indicate the type is wrong (view example) matches well with the C# nullable types. When using output cells, the clients of the library will fall into one of these two cases:

  1. They're storing data in an organized way so they always use the same type of data in the same map key (or array index);
  2. They're storing data in a less structured way so they need to check the actual type every time they read an output cell.

For the scenario [1], both the "throw exception" and "return null" approaches will work fine because the developer knows, when writing the code, that they'll always get a String or an int or whatever from a given key.

For the scenario [2], the "return null" approach works by relying on the developer to check the HasValue property of the nullable types. However, the "throw exception" approach has to be "circumvented" by using the Type property you added to make sure only the correct type is read. However, this ends up being the same as the "return null" approach because the developer will need to check for the actual type during runtime anyway.

What do you think? Is there any case that the "return null" approach was much worse to use than the "throw exception" approach at the point that it justifies this change? Or maybe you're suggesting this because you're used to methods that have a bool Try[Method](..., out T result) variant to do things "safely"?

Please, let me know what you think since this kind of change affects the ergonomics of the library. I'm looking for an approach that keeps it easy to use for most people.


The method ymap_insert accepts a pointer to the input.

Please, can you provide a runnable example of where this happens? I've been developing the library on both Windows and macOS but I never noticed such a bug.

Also, if we really have this bug and your change fixes it, we should probably have a unit test with a scenario that ensures it remains working so we don't accidentally break it in the future.


Under mac I have seen exceptions when working with the AfterTransactionEventNative struct.

Same as above, it'd be nice to have a runnable example to verify this.

For both cases, if you have the time, feel free to add unit tests to "mimic" the problematic scenarios (you can restore the old P/Invoke calls if needed).


Sorry for the long comment but I think it's a good starting point to the discussions here. I'll check the code changes the next time I have time available for this project.

LSViana commented 11 months ago

Additionally, this PR will probably be split into smaller pieces as we move since it's currently very wide and, for example, I can't even think of a proper name for it.

I'll take care of this task management part, though, so you don't need to worry about it.

SebastianStehle commented 11 months ago

Actually all the changes have something to do with getting the errors fixed:

I was never really happy with the null approach, because I am not used to it. My headless CMS Squidex is a lot about working with unmanaged data and so the libraries I use are very similar to Yint, for example:

All of them have some kind of json type that represents one of many values and all work the same. So it was confusing to work with nulls. I would say this is already a strong argument. but there are more:

  1. It is not the y-crdt way to deal with nulls. It is a problem of the yffi layer that you do not get an additional error. Because in y-crdt it seems to be tuple, so more like the TryXXX thing in .NET.
  2. Null could be a valid value for a array tag, but it is not. So you don't know whether null array exists or does not exist in y-crdt.
  3. Sometimes you just "know" that something is not null. Then you have 3 options:
    1. You are very defensive and you just ?. operator and so on => In an error you don't see it.
    2. You guard everything yourself and throw exceptions => You will see errors, then you have exactly my behavior.
    3. You do nothing => You get null reference exceptions.

Case 3 happened with the tests. I have added github actions and was running the tests under all operation systems and I got a log null reference exceptions for Linux: https://github.com/SebastianStehle/ydotnet/actions/runs/6430543358/job/17461698179. I had no idea what was going on. So after the change i have realized that something with the tag must be wrong: https://github.com/SebastianStehle/ydotnet/actions/runs/6432471124/job/17467429093


In the action above, you will notice the MacOS problem: https://github.com/SebastianStehle/ydotnet/actions/runs/6432471124/job/17467429093

The active test run was aborted. Reason: Test host process crashed : thread '<unnamed>' panicked at yffi/src/lib.rs:2506:17:
Unrecognized YVal value tag.
stack backtrace:
   0: std::panicking::begin_panic
   1: yrs::YInput::into
   2: <yrs::YInput as yrs::block::Prelim>::into_content

   3: yrs::transaction::TransactionMut::create_item
Passed!  - Failed:     0, Passed:    [27](https://github.com/SebastianStehle/ydotnet/actions/runs/6432471124/job/17467429603#step:6:28), Skipped:     0, Total:    27, Duration: 509 ms - YDotNet.Tests.Unit.dll (net7.0)
   4: yrs::types::map::Map::insert
   5: _ymap_insert

This changed after I revisited the pointers. It took my hours to figure this out, but the difference is something with const vs mutable pointers.

LSViana commented 11 months ago

Hi, @SebastianStehle!

I'll try to fix all other objective issues and try to merge smaller PRs and then we can keep discussing the API design for the Output cells later.

About the other issues, your changes look good to me and it's unfortunate we don't have an easy to reproduce scenario. I've been using macOS & Windows and all tests (except 3 which are broken on purpose until I can check some details with the Yrs team) were passing correctly & consistently.

I'll keep an eye out to see if I can detect similar issues in other places of the native bindings.


For today, I'm reviewing the code changes but I'm almost running out of time so I'll probably not be able to post comments or merge anything yet. I intend to take care of it tomorrow and during the weekend.

SebastianStehle commented 11 months ago

Hi,

About Tests

the tests do not tell the whole truth, because I noticed that it s very flaky. Sometime it depends on the garbage collector and rust. For example 2 things I noticed:

This code is buggy:

https://github.com/LSViana/ydotnet/blob/main/YDotNet/Document/Doc.cs#L363

    public EventSubscription ObserveUpdatesV2(Action<UpdateEvent> action)
    {
        var subscriptionId = DocChannel.ObserveUpdatesV2(
            Handle,
            nint.Zero,
            (_, length, data) => action(UpdateEventNative.From(length, data).ToUpdateEvent()));

        return new EventSubscription(subscriptionId);
    }

Because no managed object holds a reference to the delegate it will eventually be collected by the garbage collector. Then you get an exception that you cannot catch. The problem is that the tests are fast and do not allocate enough memory to trigger GC. So I only had this issue very sporadically and adding a few `GC.Collect´ will help to reproduce it.

The following code also creates issues:

// nested map
// doc
//    a
//       b {
//            C: 1
//       }
var doc = new Doc();
var a = doc.Map("A");
Map b;
using (var t = doc.WriteTransaction())
{
    a.Insert(t, "B", Input.Map(new Dictionary<string, Input>()));

    b = a.Get(t, "B").Map;
    b.Insert(t, "C", Input.Double(1));

    a.Remove(t, "B");
}

for (var j = 0; j < 10000; j++)
{
    using (var t = doc.WriteTransaction())
    {
        b.Insert(t, "C", Input.Double(1));
    }
}

Here b is actually deleted so you should not access it anymore, but the problem does not happen all the time. Usually after 50-100 iterations it will fail.

About PRs

Unfortunately I am not able to provide smaller PRs at the moment. There are several reasons for that but the background is that I need a yjs server for a commercial Open Source project and I have 40-60 hours per week if needed to work on that and obviously your available is lower. it is just a fact, not criticism. Some of the changes are dependent on each other so if I would make small chunks of work I would create a conflict hell on my side or I would have to wait for you. I can also not wait for your input, because it would restrict me too much.

Sometimes I also don't know yet, what the best approach is, for example I followed several approaches to improve the memory management and things became a little bit chaotic with that.

LSViana commented 11 months ago

Hi, @SebastianStehle!

About this:

Because no managed object holds a reference to the delegate it will eventually be collected by the garbage collector. Then you get an exception that you cannot catch. The problem is that the tests are fast and do not allocate enough memory to trigger GC. So I only had this issue very sporadically and adding a few `GC.Collect´ will help to reproduce it.

I noticed that bug previously and I fixed it already. I discovered it as part of the memory leak investigations (#34) and it's already fixed by now (just not in main yet).

This and this are the two commits that fixed the problem.


The following code also creates issues: (...)

Yeah, that's true. It's not ideal that this code doesn't fail immediately (because the Map isn't really disposed of yet on the Rust side) but this kind of problem is caused by a programming error. Also, if there's a place where a check should be performed about the existence/validity of the Map, it's probably in the Yrs layer.

Do you have any suggestions on how this should be fixed on the YDotNet layer?


Unfortunately I am not able to provide smaller PRs at the moment.

Don't worry, I wasn't expecting you'd do that. I intend to review and apply your changes to the main repository as soon as I can process them. What's taking me long this week is my main job and the fact I am sick and progressing slower.

To be clear, I do intend to push your changes into the repository but I'd really like to do that in a controlled, analytical way and that takes time. I thought it'd be okay for you but in case you're in a hurry to use the project, I was thinking you'd be able to use your own repository for that.

Is there anything else I can do to help you while I review & process your PRs & issues?

SebastianStehle commented 10 months ago

Do you have any suggestions on how this should be fixed on the YDotNet layer?

I think we can only fix it in .NET by reducing the validity of types to certain scopes like transactions or events. But I am also discussing it with the y-crdt team at the moment: https://github.com/y-crdt/y-crdt/issues/338. I actually had to write a custom test runner to catch these errors: https://github.com/SebastianStehle/ydotnet/blob/improve-output/Tests/YDotNet.Tests.Unit/Program.cs

Is there anything else I can do to help you while I review & process your PRs & issues?

I don't think so. My main branch is the improve-output branch at the moment. And I am really happy with that. Except the things that I cannot solve. But I found more bugs and made some improvements to the code that I think you will like. But I think the only option to merge it in is so go over the changes in a chat and decide whether we want to use the branch or not.

LSViana commented 10 months ago

Hi, @SebastianStehle!

As we discussed earlier today, I'm going to reject this PR since it's now replaced by #55.