louis-langholtz / PlayRho

An interactive physics engine & library.
zlib License
133 stars 25 forks source link

Boxes in Tumbler Testbed too squishy? #10

Closed louis-langholtz closed 7 years ago

louis-langholtz commented 7 years ago

From @louis-langholtz on January 7, 2017 22:42

Here's a screenshot showing the current state and issue:

tumblerboxestoosquishy

Copied from original issue: louis-langholtz/Box2D#3

louis-langholtz commented 7 years ago

This may just be the way things are tuned to work. Lowering the resolution rates appears to lessen the boxes squishiness. Gonna leave this issue open a while to keep an eye on it more.

NauticalMile64 commented 7 years ago

I think so. Here's a screen clip of box2d.

box2d_tumbler

What are the 'resolution rates' you're referring to?

NauticalMile64 commented 7 years ago

Actually, I think the problem is displayed more clearly by comparing the heavy on light demo in the test bed:

Box2D PlayRho
box2d_hol playrho_hol
louis-langholtz commented 7 years ago

@NauticalMile64 Thanks for looking at this also. I'm revisiting this now.

I had thought that the PlayRho Heavy on Light was simulating with different parameters and that Box2D would then similarly show the same behavior. It had in the past used a higher density which Erin and others had noted that with too high a difference in masses that the solver code had problems like we're seeing. But it turns out that the numerical values appear the same now. In the past I'd noticed that the order of creation appeared to be associated with this problem. If I change the ordering of the body creation now in PlayRho (but keep the fixture creation order the same), the problem goes away. I had thought that this ordering dependency was also a limitation of the solver and that Box2D also had this limitation. On testing this out now however, I don't see ordering of body creation changing Box2D's behavior. So this turns out to be a real bug it would seem.

As for the original Jan 7 squishiness, I believe this extra squishiness we see is indeed due to different parameters being used in particular for the linear slop sizes where Box2D is using 0.005 meters, while PlayRho is using 0.001 meters and back in Jan the engine was using an even smaller linear slop size. And these effect the resolution rates. The smaller the linear slop value the less overlap resolution will be done each world step. I will put together a settings pack that recreates as close as possible the Box2D values for parameters like this and make those the defaults for the PlayRho Testbed so that things basically can be closer to being compared apples-to-apples by default.

NauticalMile64 commented 7 years ago

If I change the ordering of the body creation now in PlayRho (but keep the fixture creation order the same), the problem goes away. I had thought that this ordering dependency was also a limitation of the solver and that Box2D also had this limitation. On testing this out now however, I don't see ordering of body creation changing Box2D's behavior. So this turns out to be a real bug it would seem.

Seems to me this is a high-priority issue. I'm stepping through the code now trying to see what's different between the two cases.

louis-langholtz commented 7 years ago

Just to be clear, I believe that the squishiness in the Tumbler demo is a different issue than what we're seeing in the Heavy on Light demo. What we're seeing in the Tumbler demo I believe is due to different settings between PlayRho and Box2D. What we're seeing in the Heavy on Light demo I believe is due to two other things: an ordering bug that I introduced, and how the solvers used in both Box2D and in PlayRho have difficulty dealing with stacked objects with the upper object having more mass than the lower. We're in part just seeing the difficulty dealing with more massive object on top of lighter object problem more in the PlayRho sim at the moment.

Anyways, I'm working on this all now and hope to find out more soon.

louis-langholtz commented 7 years ago

@NauticalMile64 The "resolution rates" I referred to are what appears in the more recent PlayRho Testbed as the Reg Resol % Rate and TOI Resol % Rate controls. These are the per-step configurable equivalents of Erin's b2_baumgarte and b2_toiBaugarte preprocessor defined compile-time values.

The resolution rates aren't at fault in this issue however as I initially wondered. At least I can reproduce the Tumbler squishiness problem with the exact same resolution rates as used by Erin's code so I take it that the problem must be something else.

louis-langholtz commented 7 years ago

Oh man...

I think I'm doing magic tricks on my self. My one hand is doing one thing while I'm looking at the other and poof! Things magically change. I'm having a Twilight moment right now in this code. LOLs!

Now I can't reproduce the Heavy on Light problem. Hadn't done anything that I thought would fix it though. The problem was there at commit 24a911d510d4b7688ce3bce69f17a4ca17d98913 (in my local repo with not everything pushed yet), and then some time after, went away. Bisecting this anomaly I see that my commit e5856a5e68c239012a48e4a27cbcf7f00e0f579c causes the Heavy on Light issue to disappear. Specifically changing the default linear slop from Meter / Real(1000) to Real(0.005f) * Meter appears to make this problem no longer show. Interesting.

NauticalMile64 commented 7 years ago

Ok cool. I got sidetracked yesterday and didn't end up getting very far with the debugging.

Specifically changing the default linear slop from Meter / Real(1000) to Real(0.005f) * Meter appears to make this problem no longer show. Interesting.

You might have already noticed this but I'll point it out for clarity's sake: given the same value for Meter the exact evaluation of those expressions is different:

Meter/Real(1000) = 0.0001*Meter
Real(0.005f) * Meter = 0.005*Meter

so this isn't just a floating point issue if that's what you're thinking; you're substantially changing the linear slop.

louis-langholtz commented 7 years ago

@NauticalMile64 Sorry. Not sure what you mean.

There are differences between changing from Meter / Real(1000) to Real(0.005f) * Meter, yes. Approximately speaking, Meter / Real(1000) is 0.001f (not 0.0001) and Real(0.005f) * Meter is 0.005f. So one's five times the other.

0.001f is the linear slop I've been using for a while in my fork of Box2D. 0.005f is what Erin's Box2D 2.3.2 code uses.

I mean to restore the default linear slop value DefaultLinearSlop to what Erin's code uses so as to by-default provide a more apples-to-apples experience in regards to the linear slop. My hope is that this will be less surprising then to new users who might be coming from Box2D.

Many Testbed behaviors are tied to things like what the compile time defaults are. At least in terms of how they run short of the user using the Testbed controls to change the exposed step-time configurable settings (from their defaults).

Some unit tests meanwhile are tied to the settings I'd been using so I'm having to modify those using the configurability in PlayRho to maintain the same behaviors that the unit tests were testing for. This actually served to find a bug I had in my code; a place in my code that I hadn't passed the world step configuration values through and the engine was instead only using the compile time default value.

louis-langholtz commented 7 years ago

Opened issue #32 for this. My intention though with issue #32 in regards to this issue, is more to eliminate configuration differences as the culprit.

louis-langholtz commented 7 years ago

As of commit 7346ed0512f6d4782376730748b41bd452bed729:

Unfortunately this commit doesn't get the squishiness in the Tumbler demo to appear to be quite as little as we see in the screen clip of Box2D. I'm continuing to work on this though.

louis-langholtz commented 7 years ago

For comparison's sake, here's a screenshot of the Tumbler Demo from commit e62b7c6b0e64409105dc1961a433dbfa0c38578c:

tumbler demo e62b7c6

It's less squishy looking than the one uploaded January 7, 2017 but it's more squishy looking than the one from Box2D that NauticalMile64 uploaded 15 days ago.

Note: some squishiness is natural for the iterative solver algorithm used and can be seen in a variety of physics engine because of this. The degree of squishiness is concerning though. Why is it greater than from Erin's Box2D physics library? What difference was introduced that increases the squishiness from Erin's code?

louis-langholtz commented 7 years ago

First introduction of squishiness seems to be between these commits:

Presumably due to changes to one or more of the following files which need to be checked:

Using the following command then to inspect the diffs:

git diff -w 27fdb58 59efc1f -- $filename
louis-langholtz commented 7 years ago

Seems when I manually roll back some of the changes to the contact solver for solving the position constraint (by changing code that's now in PlayRho/Dynamics/Contacts/ContactSolver.cpp), things get even squishier. Pooh. But it's a signal of sorts in my mind.

louis-langholtz commented 7 years ago

Another signal I believe... after updating the code within the current B2_DEBUG_SOLVER ifdef blocks, and recompiling to enable it, the k_errorTol assert fails in BlockSolveNormalCase1 with vcp1.velocityBias set to 0 and post_vn1 showing up as 0.00201809406. Seemingly they're ideally the same values but they're getting away from each other and it appears to happen when the Tumbler Testbed demo is about to start being extra squishy. Not sure if this is really a signal but I feel encouraged by this finding.

Meanwhile, Erin's Box2D code doesn't hit this assert. So I'd say that something is off with solving the velocity.

louis-langholtz commented 7 years ago

Narrowed down to the problem having been introduced before commit 9b6459301b0734ac326855b02f5da42d60c57c0d (May 5, 2016). The commit before this however has build issues. That seems to be a common problem back till the commit just after 27fdb58.

So with new inspection range, diff command is:

git diff -w 27fdb58 9b64593 -- $filename

Or to show just the list of the files that changed that could matter:

git diff --name-only 27fdb58 9b64593 | egrep -v 'Testbed|HelloWorld|Iterator|Chain|Circle|Joint|Rope|List|Edge|Draw|Time|xcode|Stack|Block|Call'

Which shows the following files that need to be checked:

louis-langholtz commented 7 years ago

Moving forward in commits from commit 27fdb58, I've manually checked the following commits.

Good:

Bad:

Finally! Problem introduced by commit ea5f3da590891af0bef789bdaa7a4737e051339a .

git diff -w a1cfb95 ea5f3da

File listing for this one commit:

$ git diff --name-only a1cfb95 ea5f3da | egrep -v 'Testbed|HelloWorld|Iterator|Chain|Circle|Joint|Rope|List|Edge|Draw|Time|xcode|Stack|Block|Call'
Box2D/Box2D/Collision/Shapes/b2PolygonShape.cpp
Box2D/Box2D/Collision/Shapes/b2PolygonShape.h
Box2D/Box2D/Collision/Shapes/b2Shape.h
Box2D/Box2D/Collision/b2BroadPhase.cpp
Box2D/Box2D/Collision/b2BroadPhase.h
Box2D/Box2D/Collision/b2CollidePolygon.cpp
Box2D/Box2D/Collision/b2Collision.cpp
Box2D/Box2D/Collision/b2Collision.h
Box2D/Box2D/Collision/b2Distance.cpp
Box2D/Box2D/Collision/b2Distance.h
Box2D/Box2D/Collision/b2DynamicTree.cpp
Box2D/Box2D/Collision/b2DynamicTree.h
Box2D/Box2D/Common/b2Math.h
Box2D/Box2D/Common/b2Settings.cpp
Box2D/Box2D/Common/b2Settings.h
Box2D/Box2D/Dynamics/Contacts/b2Contact.cpp
Box2D/Box2D/Dynamics/Contacts/b2Contact.h
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.cpp
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.h
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.cpp
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.h
Box2D/Box2D/Dynamics/b2Body.cpp
Box2D/Box2D/Dynamics/b2Body.h
Box2D/Box2D/Dynamics/b2ContactManager.cpp
Box2D/Box2D/Dynamics/b2ContactManager.h
Box2D/Box2D/Dynamics/b2Fixture.cpp
Box2D/Box2D/Dynamics/b2Fixture.h
Box2D/Box2D/Dynamics/b2Island.cpp
Box2D/Box2D/Dynamics/b2Island.h
Box2D/Box2D/Dynamics/b2World.cpp
Box2D/Box2D/Dynamics/b2World.h
louis-langholtz commented 7 years ago

Darn. Simply reverting the b2_linearSlop change of commit ea5f3da back to Erin's 0.005f value and reverting b2_baumgarte to Erin's 0.2f value made the Tumbler demo look just like Erin's again. So either somehow the current library code isn't defaulting to these values close enough or I'd introduced more than one set of changes that make the demo look more squishy than Erin's.

NauticalMile64 commented 7 years ago

I'll summarize the relevant changes I see for this commit.

b2Settings.h

Line 70:

-constexpr auto b2_linearSlop = float32{0.005f};
+constexpr auto b2_linearSlop = float32{0.00005f}; // float32{0.005f};

The new value is 100 times smaller. Without any other changes I would suspect this would improve the contact resolution at the expense of additional position iterations.

Line 79 (unchanged):

constexpr auto b2_polygonRadius = float32{2.0f * b2_linearSlop};

Just keeping in mind the b2_polygonRadius=0.0001; decreased by a factor of 100.

Line 98:

-constexpr auto b2_maxLinearCorrection = float32{0.2f};
+constexpr auto b2_maxLinearCorrection = b2_linearSlop * 40; // float32{0.2f};

Given this change, and the above change to b2_linear_slop, looks like b2_maxLinearCorrection=0.002; decreased by a factor of 100.

b2ContactSolver.cpp

Line 808:

 -return minSeparation >= -3.0f * b2_linearSlop;
 +return minSeparation >= (-3.0f * -b2_linearSlop);

You're flipping the sign on this calculation, I'm assuming you've now redefined 'separation' as positive and 'penetration' as negative. Mathematically, when you flip the signs in an inequality, you also have to reverse the greater than / less than symbol. Might you have forgotten to do that in this line of code? The equation returns the opposite value I'd expect it to. Could it be that you missed flipping the signs in another calculation? The absolute value of the RHS has also decreased by a factor of 100.

I can keep listing changes (and the eventual consequences), but I have to ask first: what is the reasoning for fiddling with the b2_linear_slop? Something to do with your rounded collision boxes perhaps?

louis-langholtz commented 7 years ago

@NauticalMile64 Thanks for looking at this and being a second pair of eyes!

Yes, the b2_linearSlop was 100 times smaller. When I got commit ea5f3da590891af0bef789bdaa7a4737e051339a built just to run the Tumbler Testbed demo and then reverted that value (and b2_baumgarte), things went back to normal. That's what I originally was expecting: that simply reverting defaults like these back to Erin's Box2D values would fix things. And it did in my testing of that commit. Unfortunately, the current code has the same values now as in Erin's Box2D code but the Tumbler demo -- while less squishy than it was in this commit -- is still squishier than in Erin's demo. So I figure I must have introduced other changes that also effected squishiness or I've somewhere not gotten the reverted value back in place.

Interesting catch you found with both signs being negative there! I hadn't noticed it when I was reviewing these changes a couple days ago. I see that I wound up fixing this problem the day after that commit however in commit a26205ee19daf06b9167a0d0862561572cf22a7a (Apr 6, 2016).

As for what the reasoning is for fiddling with values like b2_linearSlop, I love that question! There have been a few reasons why I've done so:

  1. To "fix" an out-of-scale sim: Originally I was playing with simulating polygons, that were around 1 cm in size, bouncing within an enclosed rectangular area that was around 6 cm by 8 cm. So I lowered the linear slop to better simulate things at that scale.
  2. To play with these values: Later I realized that understanding what the limits were and how going beyond them impacted simulations was very interesting to me. So much so that I removed all the hard-coded limits and for the most part, replaced them all with the per-step settings (see StepConf).
  3. To engineer the code to deal with limits in less arbitrary ways: Since making the hard coded limits run-time configurable, I've learned some things that have been really cool to me. Like the variable precision of floating point numbers and how the code could stop trying to bisect the times/distances in calculating the TOI once it got down to the ULP (and could no longer half the value because nextafter was past that half).

There may be more reasons but those are what come to mind at the moment.

As for continuing to review changes and analyzing their consequences, that may well be what it takes to resolve this. Know that I continue to work on this however and hope to be updating code soon towards solving this issue.

louis-langholtz commented 7 years ago

Bisecting again now with the modification of the linear slop and max linear correction default settings in hand...

git bisect start
git bisect bad
git bisect good ea5f3da

Then... 03e0b4cb (Oct 31, 2016) was good. 627197b3 (Nov 8, 2016) was good. a24e8d3f (Nov 12, 2016) was good. 35823571 (Nov 16, 2016) was good. 1c718171 () was good. 5bd79274 was good. 97c9f424 was good. Maybe 4b278af23edf454d005f0d6952fb0b4df05e43e2 first bad.

After restarting bisection again: e7b5d15d (Apr 27, 2017) bad. 1b0caec0 (Feb 26, 2017) bad. a8b1c025 (Jan 9 15:41:54 2017) bad. 744a6b74 bad. fdcf3077 bad (Jan 7 10:41:17 2017). 5ceab2a1 (Dec 6 2016) good. e2731daa (Dec 20 11:57:34 2016) good. 5e0e45ff good. c0f02b64 good. 89c728a7 good (Jan 7 10:35:23 2017). fdcf3077fbcf476db5d1947046cba3257362581d is the first bad commit

This first "bad" commit only changes code Box2D/Box2D/Collision/CollideShapes.cpp.

git diff -w 89c728a7 fdcf3077 -- Box2D/Box2D/Collision/CollideShapes.cpp

Cool. I know what the exact change is now in this file. Not sure yet what to do about it.

louis-langholtz commented 7 years ago

Better than ever!!!...

screen shot 2017-09-06 at 1 18 04 am

NauticalMile64 commented 7 years ago

Looks great!

I see there are a lot of changes that went into those commits. Would you mind summarizing your thought process and the key things you changed? It's really helpful for everyone to have this information when issues like this arise in the future.

louis-langholtz commented 7 years ago

Woot! Love the question @NauticalMile64 !

The solution boiled down to missing less opportunities to "warm start the solver" with last known manifold point impulse information. Critical to this were the changes to Contact.cpp. Secondarily were the changes to Collision.cpp. I'd added comments in the source code too about this to at least help remind me about what I did and why. These can be seen in the linked file changes.

All the other changes, were really just cosmetic or just dealing with code completeness for some corner cases (like getting the code to compile again with Real set to Fixed32) or slight optimizations.