marijnz / NativeQuadtree

A Quadtree Native Collection for Unity DOTS
MIT License
358 stars 30 forks source link

Crashes using Allocator.Persistent or Allocator.TempJob #2

Closed AdamClements closed 4 years ago

AdamClements commented 4 years ago

Firstly, thanks so much for this library, it was exactly what I needed, no more, no less.

I've been having an issue that kills Unity altogether and I think some of the unsafe ops in NativeQuadTree are causing it. I haven't nailed down an exact repro but I've narrowed it down to something like the following and would appreciate your insight if you think you could help me narrow it down some more.

I'm using one system to build a quadtree and store it (with Allocator.Persistent) as a field on the system. Other systems can then read the quadtree and perform RangeQueries on it.

If I make the NativeQuadTree Allocator.Persistent and call ClearAndBulkInsert() every frame, I get random-ish unity crashes which suggest to me that there's some overflow error writing where it shouldn't and causing different crashes depending where it is in memory. If I Dispose and re-initialise every frame I have no issue, however then Persistent is non-optimal, so I changed to TempJob and Dispose()d and re-initialised every frame, and that then also crashes similarly. The only thing that seems to work reliably is Allocator.Persistent and Dispose(), new ... every frame.

First thought: Is RangeQuery threadsafe, if multiple threads are querying the same data (it's only ever written from one place), is that likely to cause issues?

Second thought: Is it an issue with different allocators? Allocators.Temp seems to have no issue, but TempJob and Persistent result in crashes

Third thought: Is ClearAnd BulkInsert(...) well tested for the case where there's existing data, or has most of your testing been done with Temp instances that are written once then discarded?

4th thought: Am I doing this all wrong and you shouldn't keep NativeContainers as fields on systems and pass them around, that's not what they're designed for, of course you'd get crashes...

marijnz commented 4 years ago

Hi, I'm happy to hear this is exactly what you were looking for! Also as a note, this project is not used in a production environment so it could very well be that you have issues. Thank you for reporting this here as I could foresee this might find its way to the game I'm professionally working on.

My first question: Does it also crash when you're not doing any queries?

1st: It should be threadsafe, so not sure if that's the issue. Safety checks should show you if it's not the case, but are you sure the writing into the tree is done once you start querying? 2nd: Hm.. the allocator type should really not matter. However since the memory will come from different places it could change the crashing behaviour because of pointer to the wrong address (but the bug is there in all cases). However I've not tested different allocaters than the ones in my test so I can't say this does or does not matter. 3rd: Good point, I've mostly used new memory every time. But since all data is cleared (I just had a quick double check) I'm not sure if this is the issue.

Do you have Burst safety checks on and Jobs LeakDetection on? I'd expect it to not always crash, but sometimes give some other error that might give a better hint where the issue is.

marijnz commented 4 years ago

I actually found an issue scanning over the code

https://github.com/marijnz/NativeQuadtree/blob/master/Assets/NativeQuadTreeRangeQuery.cs#L37

This should be , count + 4 * tree.maxLeafElements

If your results capacity is very low (which it should not be, rather allocate a bit more there I'd suggest) then this could give issues for sure!

The current behaviour results in writing into the wrong memory, that's definitely material for the random crashes you're seeing ;)

AdamClements commented 4 years ago

That could be it, I'm only expecting ~5 results in the common case so I only allocated 10, I'll update that line and see if I can repro it and answer your other questions if it's still happening

On Tue, 7 Apr 2020 at 13:11, Marijn Zwemmer notifications@github.com wrote:

I actually found an issue scanning over the code

https://github.com/marijnz/NativeQuadtree/blob/master/Assets/NativeQuadTreeRangeQuery.cs#L37

This should be , count + 4 * tree.maxLeafElements

If your results capacity is very low (which it should not be, rather allocate a bit more there I'd suggest) then this could give issues for sure!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marijnz/NativeQuadtree/issues/2#issuecomment-610349957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG7LY4OMJPVKABEKJWZEVTRLMKAFANCNFSM4MDAFTAQ .

AdamClements commented 4 years ago

That certainly changed things - it still crashed but I got some interesting graphical glitches in the player window before it died this time.

Give me ten mins or so, I'll try and extract a minimum repro into a new project and upload it to a repo so you can see for yourself.

To answer your previous questions:

AdamClements commented 4 years ago

I've got it happening even when I'm not querying it in my repro project - seems to be caused just by setting persistent and then doing repeated ClearAndBulkInserts.

I wonder... So I'm populating it by building a Temp NativeList, what happens to the NativeQuadTree if that original list gets disposed? Does it take pointers into that list so I need to manage the lifecycle and keep that alive alongside, or does it copy the values out as I assumed?

Edit: Scratch that theory, I tried keeping the list Persistent and alive alongside - no change.

AdamClements commented 4 years ago

A project that reproduces this, I'm using Unity 2019.3.8f1 https://github.com/AdamClements/QuadTreeAllocationCrashRepro

marijnz commented 4 years ago

Thanks for that, maybe tomorrow eve I have a look!

marijnz commented 4 years ago

I think I found the issue. The Clear I never tested and I expected the UnsafeList.Clear to do a MemClear (which it doesn't, which actually makes sense)

You could replace

lookup->Clear();
nodes->Clear();
elements->Clear();

with something like this:

UnsafeUtility.MemClear(lookup->Ptr, lookup->Capacity * UnsafeUtility.SizeOf<int>);
UnsafeUtility.MemClear(nodes->Ptr, nodes->Capacity * UnsafeUtility.SizeOf<QuadNode>);
UnsafeUtility.MemClear(elements->Ptr, elements->Capacity * UnsafeUtility.SizeOf<QuadElement<T>>);
AdamClements commented 4 years ago

Trying that - everything looks good so far! I'll keep testing the rest of today and see if I hit any other edge cases but it's definitely a huge improvement and worth committing that and the earlier sizing change. Thanks for this!

marijnz commented 4 years ago

Great, feel free to make a PR for the 2 changes, otherwise I will a commit :)

AdamClements commented 4 years ago

Okay, PR incoming :)