Should inputs and outputs be tied to the transaction?

SebastianStehle commented 11 months ago

Some inputs and outputs allocate memory, which has to be freed again. Afaik this does not happen automatically. Therefore every input needs to be released again after the transaction has been completed.

This causes 2 issues IMHO:

It is not so easy to understand when an input need to be released. Because typically, when you add something to a map or array you don't want the items to be released before the container.
The output is even more complicated because you want to have the CLR value longer, but not the output itself.
It is just easy to forget in general.
Sometimes there is no container. Lets say you get a output.Collection or output.Array where you have to loop over the outputs manually and dispose them.

So in general there are not hat much uses cases ,where an input and output needs to live outside the transaction.

What I recommend is the following:

Maintain a list of all allocated resources within an transaction and then release them when the transaction is disposed.
Try to refactor the output so that it resolves the CRL value in the constructor and releases the actual output pointer immediately. I thin there is no need to keep that in memory and the use cases where you need an output without the underlaying value should be very rate.

SebastianStehle commented 11 months ago

I tried to improve memory allocations but it is actually super complicated. The problem is that every object seems to live in a scope and it actually depends on the method call and therefore Dispose does not really help.

lets take output as an example:

If you get an output from an object the output can actually be destroyed with youtput_destroy. So we could implement a dispose method and a finalizer to destroy the output when it is not needed anymore.
If the output is part on json array via youtput_read_json_array it cannot be destroyed because it is destroyed as part of youtput_destroy of the json array.
If the output is from an map iterator, it is actually super complicated. The output belongs to the map entry and cannot be destroy independently. But when do you destroy the map entry?
- If you destroy the map entry immediately, you cannot access the output later. This is especially true if you use something like FirstOrDefault which then also disposes the iterator / enumerator before the output is actually used.

We could avoid the problem with the output by resolving all values directly, but this would hurt performance, especially for events.

So on idea would be to have scopes, which are>

Doc
Transaction
Event

Every object is a resource and has a reference to a scope. Once a scope is over it releases all children and makes them unusable. If you use any method a ObjectDisposedException is thrown.

LSViana commented 11 months ago

Hi, @SebastianStehle!

About this:

Some inputs and outputs allocate memory, which has to be freed again. Afaik this does not happen automatically. Therefore every input needs to be released again after the transaction has been completed.

Yes, that's true. But I'm working on multiple improvements to the memory management of the library to free resources via finalizers.

Right now, every exposed resource that can be disposed implements IDisposable, but they don't implement finalizers to allow the GC to trigger the dispose procedure.

So, this means that today if the client of the library doesn't dispose the Input cells, for example, they'll have a memory leak. This should be fixed after #34 is finished.

The output is even more complicated because you want to have the CLR value longer, but not the output itself.

It's not obvious, but the idea of the Output cells are to be read & disposed as quickly as possible so the client of the library can keep a reference of a fully managed reference/value.

This could probably be mentioned in the class documentation.

Sometimes there is no container. Lets say you get a output.Collection or output.Array where you have to loop over the outputs manually and dispose them.

I'm afraid that's not the case. The documentation of yffi mentions that, for example, an Output cell that holds an array can be fully disposed using youtput_destroy.

Also, every Output cell can be disposed (and will have a finalizer after #34), so they shouldn't leak memory regardless of whether they're container or actual value cells.

Maintain a list of all allocated resources within an transaction and then release them when the transaction is disposed.

Unless we really need to do that, I'd rather not take that approach. The idea of this library is to be a thin layer around Yrs, so I'm always trying to write the least amount of code and make sure it works as it should.

In this case, the memory management seems to work well by just calling the necessary *_destroy methods in the Yrs side.

Try to refactor the output so that it resolves the CRL value in the constructor and releases the actual output pointer immediately. I thin there is no need to keep that in memory and the use cases where you need an output without the underlaying value should be very rate.

While developing that class, I thought about it, but I didn't do it because:

There was no reference/tag (even though it was possible) to the type of content being held
Doing this automatically instead of on-demand when the client code reads the value creates a heavy, unnecessary overhead that can be problematic at scale

So, because of that, I suggest we keep giving more control to the client code and only trigger code to read values when they're requested (if that's possible, like in this case).

If you get an output from an object the output can actually be destroyed with youtput_destroy. So we could implement a dispose method and a finalizer to destroy the output when it is not needed anymore.

That's right. It should be done after #34.

If the output is part on json array via youtput_read_json_array it cannot be destroyed because it is destroyed as part of youtput_destroy of the json array.

That's already handled by giving a "flag" to every new Output cell to instruct it about whether it should dispose itself during the Dispose() method. This solves this problem since Output cells that are nested are instructed not to dispose themselves, so there's no duplicate dispose by accident (unless there's a programming error in the library).

If the output is from an map iterator, it is actually super complicated. The output belongs to the map entry and cannot be destroy independently. But when do you destroy the map entry?

Let's take the map iterator as an example, whenever the library requests the next entry, it should store the pointer and call ymap_entry_destroy on that entry later. The entry holds the Output cell and that operation should it too.

This is documented in the ymap_iter_next() method.

So on idea would be to have scopes, which are: (...)

Please, check if the mentions I added above help alleviate the problem you mentioned. From my point of view, there's no need for us to write extra code to handle memory management except for calling the Yrs *_destroy methods in the correct resources.

If I missed something here, please let me know and I can re-evaluate this comment.

SebastianStehle commented 11 months ago

I came to the same conclusion now with finalizers. In my own fork I am going for a simplified approach for now, that does not need dispose at all, because almost all memory can be released immediately.

If you want to keep the output you need a reference to a resource owner to keep the resource owner alive, otherwise something like that could fail:

var x = map.Iterate().Where(x => x ...).FirstOrDefault();

// GC.Collect();
x.DoSomething(); // Might crash because the map entry has been released already.

LSViana commented 10 months ago

Hi, @SebastianStehle!

Same as other issues in the repo, I believe this one can be closed now that we merged #55, right?

y-crdt / ydotnet

Should inputs and outputs be tied to the transaction? #45