realm / realm-dotnet

Realm is a mobile database: a replacement for SQLite & ORMs
https://realm.io
Apache License 2.0
1.25k stars 164 forks source link

[Bug]: NullReferenceException in a loop #2923

Closed Vandersteen closed 2 years ago

Vandersteen commented 2 years ago

What happened?

NullReferenceException in a loop

Repro steps

I have a hard time creating a 100% repro step, however this is the situation i'm having:

I have a 'highly' concurrent setup, where items are added / updated / removed by multiple threads.

In one of the threads I then loop over all items, using:

using (var realm = Realm.GetInstance())
{
          var models = realm.All<MyModel>();

          foreach (var model in models)
          {

                //model is sometimes NULL here
                // or model is 'no longer valid'

          }
}

How can I make this access safe ?

Version

10.9.0

What SDK flavour are you using?

Local Database only

What type of application is this?

Xamarin

Client OS and version

iOS 15.3

Code snippets

No response

Stacktrace of the exception/crash you're getting

No response

Relevant log output

No response

peppy commented 2 years ago

As long as each thread is retrieving its own instance via Realm.GetInstance(), the code you provided should already be thread-safe. The only way I could imagine this breaking would be if you are calling realm.Refresh() from within the loop, or have a SynchronizationContext installed which isn't behaving correctly. Apparently incorrect assumption.

papafe commented 2 years ago

Hi @Vandersteen,

The issue here is that you're modifying the realm from several threads, so the object can become invalid while you loop through the collection, as an object becomes invalid if it has been removed from the database.

What you could do in this case is dependent on your use case:

[NOTE] (This was added after the initial comment was posted, to clarify) Normally, when using realm on a background thread, the realm gets "pinned" to a certain version. This means that it will have access to all the objects that have been present when the realm was opened on that thread. If the realm gets refreshed, instead, then the realm will point to the most recent version, and as such some objects may become unavailable, because deleted on other threads for example. The realm gets refreshed when calling Refresh() (explicitly) or when performing a write (implicitly). Please note that all of this discourse is not valid on the main thread, as the realm gets refreshed automatically in the background in that case.

peppy commented 2 years ago

@papafe Is this documented somewhere? That goes against my expectations and likely means we have been using realm incorrectly this whole time. It also makes the whole "realm is thread safe" less meaningful (I was under the assumption that as long as an open instance was pointing to an old version it would still have access to items from that point in time, in the state they were in at that point in time..)

This basically means that any asynchronous access to realm needs to have special handling to ensure all object accesses are valid else it may throw, correct?

Vandersteen commented 2 years ago
  • If you do not care about objects that have been removed, then you can just ignore them

Should I check for null before any access of the models ? Or just at the start

  • If you need to do any action on the objects that have been removed in the meanwhile, you can freeze the collection with realm.All<MyModel>().Freeze(). This will keep the collection in the same state as when you froze it, so you should not see any invalid object. You need to be sure to not keep any reference to the frozen collection or any frozen object though, as this will increase the memory occupied by the realm.

I could give this a shot

For more context:

The loop that has issues with 'null' is the last step

        public void RequeueLostTasks()
        {
            _nsUrlSession.GetAllTasks(RequeueTasksNotInSession);
        }

        public void RequeueTasksNotInSession(NSUrlSessionTask[] tasks)
        {
            using (var realm = Realm.GetInstance())
            {
                var syncs = realm.All<SyncModel>();

                foreach (var sync in syncs)
                {
                     var activeTask = tasks.FirstOrDefault(x => x.TaskIdentifier == (nuint?)sync.taskIdentifier);  <--- HERE I GET NULLREF
                     ... 
nirinchev commented 2 years ago

Do you have a realm.Write somewhere in that method?

Vandersteen commented 2 years ago

I do, if no 'activeTask' has been found -> the items get's requeued (new NSUrlUploadTask) and then:

...
var uploadTask = BuildUploadTask(sync);
try
{
    realm.Write(() => { sync.TaskIdentifier = Convert.ToInt32(uploadTask.TaskIdentifier); });
    Console.WriteLine($"Resuming task with identifier {uploadTask.TaskIdentifier}");
    uploadTask.Resume();
}
catch (Exception e)
{
    Console.WriteLine(
        $"Unexpected error while updating taskIdentifier in realm for SyncId {sync.Id}, {e}");
    uploadTask.Cancel();
}
...
Vandersteen commented 2 years ago

Also a

realm.Write(() => realm.Remove(sync));
continue;

in some edgecases

papafe commented 2 years ago

@peppy You have been definitely correct the whole time, I wrote too hastily and didn't explain myself properly, I am sorry. If the background thread does not get refreshed this should not happen. I will add a note to my previous comment to clarify.

@Vandersteen What is happening here is that the realm gets refreshed implicitly when you do a write. This means that the realm will get updated to the latest available version when the write is started, and some objects could have been removed in the meanwhile. This causes you seeing invalid or null objects during the iteration. Depending on your use case, you should also be careful because the object in your write transaction could have been modified/deleted before you use it. I hope this will be of help in understanding what is happening here.

peppy commented 2 years ago

Phew, thanks for the confirmation. I wrote up a test in the mean time and can confirm the behaviour is as we expect, and also as you mention regarding the implicit refresh on write:

// internally calls Realm.GetInstance()
realm.Write(ctx => ctx.Add(new BeatmapInfo(CreateRuleset(), new BeatmapDifficulty(), new BeatmapMetadata())));

ManualResetEventSlim resetEventSlim = new ManualResetEventSlim();

var task = Task.Factory.StartNew(() =>
{
    // internally calls Realm.GetInstance()
    realm.Run(ctx =>
    {
        var beatmaps = ctx.All<BeatmapInfo>();

        Assert.That(beatmaps.Count(), Is.EqualTo(1));

        // wait for delete operation on main thread.
        resetEventSlim.Wait();

        // can still access all objects with no exception.
        Assert.That(beatmaps.Count(), Is.EqualTo(1));
        var beatmapInfo = beatmaps.First();
        Assert.That(beatmapInfo.Metadata, Is.Not.Null);

        // .. until a refresh
        ctx.Refresh(); // <- can be replaced with an empty write to produce the same effect ctx.Write(() => { });

        // .. after which the object goes away and will throw an invalid object exception
        Assert.That(beatmaps.Count(), Is.EqualTo(0));
        Assert.Throws<RealmInvalidObjectException>(() => Console.WriteLine(beatmapInfo.Metadata.ToString()));
    });
}, TaskCreationOptions.LongRunning | TaskCreationOptions.HideScheduler);

// internally calls Realm.GetInstance()
realm.Write(ctx =>
{
    ctx.RemoveAll<BeatmapInfo>();
    ctx.Refresh();
});

resetEventSlim.Set();

task.WaitSafely();
papafe commented 2 years ago

@Vandersteen Another thing to note is that the iteration is not stable through write transaction due to the implicit refresh that is happening. So, whenever you are iterating through a collection and you could modify the collections itself inside, you should use a write transaction around it. So instead of doing:

//Iteration stability not guaranteeed.
foreach (....)
{
    realm.Write(...); 
}

You should do:

realm.Write(() =>
{
    foreach(....)
}

If you do it in the first manner, then some objects could become invalid, or even the order could be modified, so you can end up processing the same object twice.

papafe commented 2 years ago

@peppy Thanks a lot for the test and the confirmation! Sorry again for startling you 😄

Vandersteen commented 2 years ago

@Vandersteen Another thing to note is that the iteration is not stable through write transaction due to the implicit refresh that is happening. So, whenever you are iterating through a collection and you could modify the collections itself inside, you should use a write transaction around it. So instead of doing:

//Iteration stability not guaranteeed.
foreach (....)
{
  realm.Write(...); 
}

You should do:

realm.Write(() =>
{
  foreach(....)
}

If you do it in the first manner, then some objects could become invalid, or even the order could be modified, so you can end up processing the same object twice.

Thank you for clarifying, is this written somewhere in the documentation ? Does the latter block other threads from accessing these objects ?

papafe commented 2 years ago

@Vandersteen I don't think it is written in the documentation, we should probably add a note. And no, it does not effect other threads.

nirinchev commented 2 years ago

To clarify on the second point - it will block other threads from starting a write transaction though they can still read the data.

Vandersteen commented 2 years ago

Ok, so other threads 'realm.Write()' will be blocking but the rest will work.

Thanks again for clarifying ! You can go ahead close this ticket