notgiven688 / jitterphysics2

Fast, simple, and dependency-free physics engine written in C# with a clear and user-friendly API.
MIT License
219 stars 18 forks source link

NuGet Package Not Working on Linux-x64 #174

Closed tin-can-tomatoes closed 3 days ago

tin-can-tomatoes commented 1 week ago

TLDR: Seems to be some kind of issue with the NuGet package, at least on Arch Linux x64. Building from repo works fine.

I wanted to add physics to a new 3d project I started and stumbled across this project and cloned the repository to look at the Demos. They seemed to work well, even on my dated laptop, so I decided to go ahead and add the Jitter2 NuGet package to my project. I copy-pasted some of the code from the Hello World example in the docs into my project and ran it, and not a single frame was rendered, program completely locked up. Pausing the debugger did not work.

After some investigation, the hangup seemed to be somewhere in world.Step(1f/100f,false);, but I figured it's probably something I did wrong, so I took a step back and followed the Hello World tutorial step-by-step. Same thing.

Checked my .NET SDK version and it seems the arch repo package is out of date, so I uninstalled it and installed the latest version of the binaries from Microsoft, and tried again. Still no luck.

Went back and tried the Demos from the cloned repo, and they still worked. Then I decided to just try copying the Jitter2 project into my solution and ditched the NuGet package in favor of a project reference, and lo and behold: everything just started working.

My system:

tomatoes@tomato-can[~]$ dotnet --version
8.0.401
tomatoes@tomato-can[~]$ uname -a
Linux tomato-can 6.10.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 04 Sep 2024 15:16:37 +0000 x86_64 GNU/Linux

I'm not too sure how to debug a NuGet package when the repo code works just fine, so apologies for not being able to offer up more detail, but I felt I ought to report it ASAP. I should have time to try replicating this issue on a few other machines come the weekend.

notgiven688 commented 1 week ago

Thank you for this qualified report. I'm also using Linux, but can not reproduce your issue.

After some investigation, the hangup seemed to be somewhere in world.Step(1f/100f,false);

We already noticed some strange hangups (https://github.com/notgiven688/jitterphysics2/issues/115) for which the root cause has not been found yet. I assume it has something to do with threading. However in your example (world.Step(dt, false)) you have multi-threading disabled.

I should have time to try replicating this issue on a few other machines come the weekend.

Looking forward to your analysis.

tin-can-tomatoes commented 1 week ago

Weekend started a bit early so I decided to do a bit more testing and a bit of poking around before trying it out elsewhere... I do think it is related to the issue you mentioned -

With the NuGet package: I tried each build configuration [of the test project]: hangs regardless if Debug or Release Debugger/No Debugger: worked perfectly with no debugger, hangs with debugger (continued using debugger after this) true / false in World.Step parameters: hangs either way

So it certainly seems like there's something fishy going on with the debugger.

I figured out how to configure VS Code to load symbols and sources for the package and was able to step through the call to world.Step(1f/100f, false); I deliberately left multi-threading off for debugging this one.

It seems on World.Step.cs:729 (compiler must have optimized this because debugger insists it is on L728), ThreadPool.Instance is accessed for the first time (with MT off) and therein calls the constructor, which then calls ChangeThreadCount.

Inside ChangeThreadCount is a for loop that starts new threads, and I determined that the program was consistently hanging for me at ThreadPool.cs:143 / threads[i].Start() when i held the value of 2. Strangely enough the program would also sometimes hang at other points stepping through parts of the ThreadPool class, but this was far rarer, whereas I could count on the hang occurring at L143 when i == 2.

Stepping in further, the closest I could get to a final location for the problem was in the dotnet code itself, inside Thread.CoreCLR.cs:90 at a call to StartInternal.

Here's what's going on at this point: Call stack and variables

Going any further than this resulted in the infinite hangup.

I also tried turning off my CPU's hyper-threading (from 8 logical processors to 4), and the program started working again, but in this case, i never got up to 2 inside the aforementioned loop.

Either way, it seems like the thread pool is spinning up threads regardless of whether they're desired/needed or not, so perhaps addressing that would reduce the inconvenience of the underlying bug.

But ultimately, your comments in the other thread that it could be a problem with the debugger itself seem to check out so far. Wish I knew more about how StartInternal worked across different platforms.

I'm still gonna try a couple other computers with different configurations of Linux to see if there's any other reasons why it seems so hard to reproduce. Still unable to repro this with the ProjectReferenced version of my test project still.

notgiven688 commented 1 week ago

Thank you very much @tin-can-tomatoes

Here is my minimal reproducer:

Console.WriteLine("Hello, World!");

SpinUpThreads();

Console.WriteLine("Bye, World!");

void SpinUpThreads()
{
    void ThreadProc()
    {
        while (true)
        {
            // Thread.Yield();
        }
    }

    var threads = new Thread[4];

    Console.WriteLine("Initializing.");

    var initWaitHandle = new AutoResetEvent(false);

    for (int i = 0; i < 4; i++)
    {
        threads[i] = new Thread(() =>
            {
                initWaitHandle.Set();
                ThreadProc();
            });

        threads[i].IsBackground = true;
        threads[i].Start();
        initWaitHandle.WaitOne();
    }
}

For me (under Linux) I get with the debugger: Hello, World! Initializing. {Hangs}

The code works if I comment in "Thread.Yield()". However this does only seem to "hide" the bug, as doing so in Jitter itself makes the engine hang at a later point.

tin-can-tomatoes commented 1 week ago

Hmm I played with your repro code and see the same thing with Thread.Yield.

Back in the Jitter2 repo I figured out that

#if DEBUG
    public const float ThreadsPerProcessor = 0.0f;
#else
    public const float ThreadsPerProcessor = 0.9f;
#endif

was why I wasn't having the issue when compiling the library myself, so I changed it to always be 0.9 and now I can reproduce the hangup in the Jitter2 sln itself.

I played around with adding Thread.Yield()s and Thread.Sleep(...)s into ThreadProc and it seemed to alleviate the problem for me - I didn't experience any other hangups at later points.

Here's what I ended up with:

    private void ThreadProc()
    {
        while (running)
        {
            if (taskQueue.TryDequeue(out ITask? result))
            {
                result.Perform();
                Interlocked.Decrement(ref tasksLeft);
            }
            else
            {
                Thread.Yield();
                mainResetEvent.Wait();
            }
        }
    }

Having Thread.Yield() follow the mainResetEvent.Wait() statement also seems to work. Having Thread.Yield outside of the if statement also works, but appears to be unnecessary when the if statement's condition is true.

I'm figuring the overall issue is the debugger struggling to get/do something with the blocked or infinite-looping thread, and having a Yield or Sleep in there allows the debugger a chance to do what it needs to. Perhaps this is the fix if we can make sure it's not hanging elsewhere like you experienced.

Or if that fix still proves lacking, perhaps a good workaround for the time being would be to allow the consumer of the library to set the initial thread count of the ThreadPool before the instance is created - either by changing ThreadsPerProcessor to a modifiable static property instead of a const, or by having a way to create the instance with an explicit number of threads.

notgiven688 commented 1 week ago

That took a while...

but https://github.com/dotnet/runtime/issues/33561 came to the rescue!

It turns out there's an actual bug in the .NET runtime. From what I can see, the .NET 9 preview includes the commit, which seems to fix our issues.

tin-can-tomatoes commented 1 week ago

That is good news! Looks like .NET 9 will land in mid-November, so I'm good with using workarounds until then.

notgiven688 commented 5 days ago

@tin-can-tomatoes: We should be safe with: https://github.com/notgiven688/jitterphysics2/commit/1c2aca9cfe91157b8ac97d2911ee3c01de189cd3 (?)

edit: A pre-release is available: "dotnet add package Jitter2 --version 2.4.4-alpha". Can you test it?

tin-can-tomatoes commented 5 days ago

I'm at work so I won't be able to test properly (with the nuget artifact) until tonight, but a quick clone of the jitterphysics2 repo seems to debug just fine on my work laptop (also linux x64). Reading over the changes, they look sound.


If you're working on putting out a new NuGet release, I think it may also be worth looking into changing World.Step to avoid calling ThreadPool.Instance altogether when multithreading is set to false.

My attempt to do this is the following changes:

Refactored DetectCollisions (World.Step.cs) to:


    private void DetectCollisions(bool multiThread)
    {
        const int taskThreshold = 1024;
        if(multiThread){
            int numTasks = DynamicTree.PotentialPairs.Slots.Length / taskThreshold + 1;
            numTasks = Math.Min(numTasks, ThreadPool.Instance.ThreadCount);

            if (numTasks > 1)
            {
                Parallel.ForBatch(0, DynamicTree.PotentialPairs.Slots.Length,
                    ThreadPool.Instance.ThreadCount, detectCollisions);
                return;
            }
        } 

        DetectCollisionsCallback(new Parallel.Batch(0, DynamicTree.PotentialPairs.Slots.Length));

    }

Added a multiThread parameter to CheckBagCount in DynamicTree.cs:

    private void CheckBagCount(bool multiThread)
    {
        int numThreads = multiThread? ThreadPool.Instance.ThreadCount : 1;
        if (lists.Length != numThreads)
        {
            lists = new SlimBag<IDynamicTreeProxy>[numThreads];
            for (int i = 0; i < numThreads; i++)
            {
                lists[i] = new SlimBag<IDynamicTreeProxy>();
            }
        }
    }

And lastly, changed the last part of World.Step() function to:

        // Signal the thread pool that threads can go into a wait state. If threadModel is set to
        // aggressive this will not happen. Also make sure that a switch from (aggressive, multiThreaded)
        // to (aggressive, sequential) triggers a signalReset here.
        if (ThreadModel == ThreadModelType.Regular && multiThread /* || !multiThread */)
        {
            ThreadPool.Instance.SignalReset();
        }

I'm pretty sure the change I made to DetectCollisions doesn't cause any side effects - but for the other two changes, I'm wary that they may have implications I haven't thought of yet.

tin-can-tomatoes commented 4 days ago

notgiven688 I tested the prerelease nuget package on the BoxDrop demo and my own project, and it works in the debugger without issue.

notgiven688 commented 3 days ago

@tin-can-tomatoes https://github.com/notgiven688/jitterphysics2/pull/182 This includes your suggestions to not initialize the ThreadPool ever if multithread is set to false.

notgiven688 commented 3 days ago

Fixed in Jitter 2.4.4.