Closed louis-langholtz closed 7 years ago
This may just be the way things are tuned to work. Lowering the resolution rates appears to lessen the boxes squishiness. Gonna leave this issue open a while to keep an eye on it more.
I think so. Here's a screen clip of box2d.
What are the 'resolution rates' you're referring to?
Actually, I think the problem is displayed more clearly by comparing the heavy on light
demo in the test bed:
Box2D | PlayRho |
---|---|
@NauticalMile64 Thanks for looking at this also. I'm revisiting this now.
I had thought that the PlayRho Heavy on Light was simulating with different parameters and that Box2D would then similarly show the same behavior. It had in the past used a higher density which Erin and others had noted that with too high a difference in masses that the solver code had problems like we're seeing. But it turns out that the numerical values appear the same now. In the past I'd noticed that the order of creation appeared to be associated with this problem. If I change the ordering of the body creation now in PlayRho (but keep the fixture creation order the same), the problem goes away. I had thought that this ordering dependency was also a limitation of the solver and that Box2D also had this limitation. On testing this out now however, I don't see ordering of body creation changing Box2D's behavior. So this turns out to be a real bug it would seem.
As for the original Jan 7 squishiness, I believe this extra squishiness we see is indeed due to different parameters being used in particular for the linear slop sizes where Box2D is using 0.005 meters, while PlayRho is using 0.001 meters and back in Jan the engine was using an even smaller linear slop size. And these effect the resolution rates. The smaller the linear slop value the less overlap resolution will be done each world step. I will put together a settings pack that recreates as close as possible the Box2D values for parameters like this and make those the defaults for the PlayRho Testbed so that things basically can be closer to being compared apples-to-apples by default.
If I change the ordering of the body creation now in PlayRho (but keep the fixture creation order the same), the problem goes away. I had thought that this ordering dependency was also a limitation of the solver and that Box2D also had this limitation. On testing this out now however, I don't see ordering of body creation changing Box2D's behavior. So this turns out to be a real bug it would seem.
Seems to me this is a high-priority issue. I'm stepping through the code now trying to see what's different between the two cases.
Just to be clear, I believe that the squishiness in the Tumbler
demo is a different issue than what we're seeing in the Heavy on Light
demo. What we're seeing in the Tumbler
demo I believe is due to different settings between PlayRho and Box2D. What we're seeing in the Heavy on Light
demo I believe is due to two other things: an ordering bug that I introduced, and how the solvers used in both Box2D and in PlayRho have difficulty dealing with stacked objects with the upper object having more mass than the lower. We're in part just seeing the difficulty dealing with more massive object on top of lighter object problem more in the PlayRho sim at the moment.
Anyways, I'm working on this all now and hope to find out more soon.
@NauticalMile64 The "resolution rates" I referred to are what appears in the more recent PlayRho Testbed as the Reg Resol % Rate
and TOI Resol % Rate
controls. These are the per-step configurable equivalents of Erin's b2_baumgarte
and b2_toiBaugarte
preprocessor defined compile-time values.
The resolution rates aren't at fault in this issue however as I initially wondered. At least I can reproduce the Tumbler squishiness problem with the exact same resolution rates as used by Erin's code so I take it that the problem must be something else.
Oh man...
I think I'm doing magic tricks on my self. My one hand is doing one thing while I'm looking at the other and poof! Things magically change. I'm having a Twilight moment right now in this code. LOLs!
Now I can't reproduce the Heavy on Light
problem. Hadn't done anything that I thought would fix it though. The problem was there at commit 24a911d510d4b7688ce3bce69f17a4ca17d98913 (in my local repo with not everything pushed yet), and then some time after, went away. Bisecting this anomaly I see that my commit e5856a5e68c239012a48e4a27cbcf7f00e0f579c causes the Heavy on Light
issue to disappear. Specifically changing the default linear slop from Meter / Real(1000)
to Real(0.005f) * Meter
appears to make this problem no longer show. Interesting.
Ok cool. I got sidetracked yesterday and didn't end up getting very far with the debugging.
Specifically changing the default linear slop from Meter / Real(1000) to Real(0.005f) * Meter appears to make this problem no longer show. Interesting.
You might have already noticed this but I'll point it out for clarity's sake: given the same value for Meter
the exact evaluation of those expressions is different:
Meter/Real(1000) = 0.0001*Meter
Real(0.005f) * Meter = 0.005*Meter
so this isn't just a floating point issue if that's what you're thinking; you're substantially changing the linear slop.
@NauticalMile64 Sorry. Not sure what you mean.
There are differences between changing from Meter / Real(1000)
to Real(0.005f) * Meter
, yes. Approximately speaking, Meter / Real(1000)
is 0.001f
(not 0.0001
) and Real(0.005f) * Meter
is 0.005f
. So one's five times the other.
0.001f
is the linear slop I've been using for a while in my fork of Box2D. 0.005f
is what Erin's Box2D 2.3.2 code uses.
I mean to restore the default linear slop value DefaultLinearSlop
to what Erin's code uses so as to by-default provide a more apples-to-apples experience in regards to the linear slop. My hope is that this will be less surprising then to new users who might be coming from Box2D.
Many Testbed behaviors are tied to things like what the compile time defaults are. At least in terms of how they run short of the user using the Testbed controls to change the exposed step-time configurable settings (from their defaults).
Some unit tests meanwhile are tied to the settings I'd been using so I'm having to modify those using the configurability in PlayRho to maintain the same behaviors that the unit tests were testing for. This actually served to find a bug I had in my code; a place in my code that I hadn't passed the world step configuration values through and the engine was instead only using the compile time default value.
Opened issue #32 for this. My intention though with issue #32 in regards to this issue, is more to eliminate configuration differences as the culprit.
As of commit 7346ed0512f6d4782376730748b41bd452bed729:
Heavy on Light
problem should no longer be seen in the Testbed as it seems to be related to the smaller linear slop default that had previously been used.Heavy on Light
problem in the unit test code as the World.HeavyOnLight
test showing that the problem doesn't occur with the new default linear slop value but still occurs with the previous linear slop and related settings. Note that this unit test is currently written to expect and accept these situations.Unfortunately this commit doesn't get the squishiness in the Tumbler
demo to appear to be quite as little as we see in the screen clip of Box2D. I'm continuing to work on this though.
For comparison's sake, here's a screenshot of the Tumbler Demo from commit e62b7c6b0e64409105dc1961a433dbfa0c38578c:
It's less squishy looking than the one uploaded January 7, 2017 but it's more squishy looking than the one from Box2D that NauticalMile64 uploaded 15 days ago.
Note: some squishiness is natural for the iterative solver algorithm used and can be seen in a variety of physics engine because of this. The degree of squishiness is concerning though. Why is it greater than from Erin's Box2D physics library? What difference was introduced that increases the squishiness from Erin's code?
First introduction of squishiness seems to be between these commits:
Presumably due to changes to one or more of the following files which need to be checked:
Box2D/Box2D/Collision/b2BroadPhase.cpp
Box2D/Box2D/Collision/b2BroadPhase.h
Box2D/Box2D/Collision/b2CollidePolygon.cpp
Box2D/Box2D/Collision/b2Collision.cpp
Box2D/Box2D/Collision/b2Collision.h
Box2D/Box2D/Collision/b2Distance.cpp
Box2D/Box2D/Collision/b2Distance.h
Box2D/Box2D/Collision/b2DynamicTree.cpp
Box2D/Box2D/Collision/b2DynamicTree.h
Box2D/Box2D/Collision/b2TimeOfImpact.cpp
Box2D/Box2D/Collision/b2TimeOfImpact.h
Box2D/Box2D/Common/b2Math.cpp
Box2D/Box2D/Common/b2Math.h
Box2D/Box2D/Common/b2Settings.cpp
Box2D/Box2D/Common/b2Settings.h
Box2D/Box2D/Dynamics/Contacts/b2Contact.cpp
Box2D/Box2D/Dynamics/Contacts/b2Contact.h
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.cpp
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.h
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.cpp
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.h
Box2D/Box2D/Dynamics/b2Body.cpp
Box2D/Box2D/Dynamics/b2Body.h
Box2D/Box2D/Dynamics/b2ContactManager.cpp
Box2D/Box2D/Dynamics/b2ContactManager.h
Box2D/Box2D/Dynamics/b2Fixture.cpp
Box2D/Box2D/Dynamics/b2Fixture.h
Box2D/Box2D/Dynamics/b2Island.cpp
Box2D/Box2D/Dynamics/b2Island.h
Box2D/Box2D/Dynamics/b2TimeStep.h
Box2D/Box2D/Dynamics/b2World.cpp
Box2D/Box2D/Dynamics/b2World.h
Box2D/Box2D/Dynamics/b2WorldCallbacks.h
Using the following command then to inspect the diffs:
git diff -w 27fdb58 59efc1f -- $filename
Seems when I manually roll back some of the changes to the contact solver for solving the position constraint (by changing code that's now in PlayRho/Dynamics/Contacts/ContactSolver.cpp
), things get even squishier. Pooh. But it's a signal of sorts in my mind.
Another signal I believe... after updating the code within the current B2_DEBUG_SOLVER
ifdef
blocks, and recompiling to enable it, the k_errorTol
assert fails in BlockSolveNormalCase1
with vcp1.velocityBias
set to 0 and post_vn1
showing up as 0.00201809406. Seemingly they're ideally the same values but they're getting away from each other and it appears to happen when the Tumbler Testbed demo is about to start being extra squishy. Not sure if this is really a signal but I feel encouraged by this finding.
Meanwhile, Erin's Box2D code doesn't hit this assert. So I'd say that something is off with solving the velocity.
Narrowed down to the problem having been introduced before commit 9b6459301b0734ac326855b02f5da42d60c57c0d (May 5, 2016). The commit before this however has build issues. That seems to be a common problem back till the commit just after 27fdb58.
So with new inspection range, diff command is:
git diff -w 27fdb58 9b64593 -- $filename
Or to show just the list of the files that changed that could matter:
git diff --name-only 27fdb58 9b64593 | egrep -v 'Testbed|HelloWorld|Iterator|Chain|Circle|Joint|Rope|List|Edge|Draw|Time|xcode|Stack|Block|Call'
Which shows the following files that need to be checked:
Box2D/Box2D/Collision/Shapes/b2PolygonShape.cpp
Box2D/Box2D/Collision/Shapes/b2PolygonShape.h
Box2D/Box2D/Collision/Shapes/b2Shape.h
Box2D/Box2D/Collision/b2BroadPhase.cpp
Box2D/Box2D/Collision/b2BroadPhase.h
Box2D/Box2D/Collision/b2CollidePolygon.cpp
Box2D/Box2D/Collision/b2Collision.cpp
Box2D/Box2D/Collision/b2Collision.h
Box2D/Box2D/Collision/b2Distance.cpp
Box2D/Box2D/Collision/b2Distance.h
Box2D/Box2D/Collision/b2DynamicTree.cpp
Box2D/Box2D/Collision/b2DynamicTree.h
Box2D/Box2D/Common/b2Math.cpp
Box2D/Box2D/Common/b2Math.h
Box2D/Box2D/Common/b2Settings.cpp
Box2D/Box2D/Common/b2Settings.h
Box2D/Box2D/Dynamics/Contacts/b2Contact.cpp
Box2D/Box2D/Dynamics/Contacts/b2Contact.h
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.cpp
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.h
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.cpp
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.h
Box2D/Box2D/Dynamics/b2Body.cpp
Box2D/Box2D/Dynamics/b2Body.h
Box2D/Box2D/Dynamics/b2ContactManager.cpp
Box2D/Box2D/Dynamics/b2ContactManager.h
Box2D/Box2D/Dynamics/b2Fixture.cpp
Box2D/Box2D/Dynamics/b2Fixture.h
Box2D/Box2D/Dynamics/b2Island.cpp
Box2D/Box2D/Dynamics/b2Island.h
Box2D/Box2D/Dynamics/b2World.cpp
Box2D/Box2D/Dynamics/b2World.h
Moving forward in commits from commit 27fdb58, I've manually checked the following commits.
Good:
Bad:
Finally! Problem introduced by commit ea5f3da590891af0bef789bdaa7a4737e051339a .
git diff -w a1cfb95 ea5f3da
File listing for this one commit:
$ git diff --name-only a1cfb95 ea5f3da | egrep -v 'Testbed|HelloWorld|Iterator|Chain|Circle|Joint|Rope|List|Edge|Draw|Time|xcode|Stack|Block|Call'
Box2D/Box2D/Collision/Shapes/b2PolygonShape.cpp
Box2D/Box2D/Collision/Shapes/b2PolygonShape.h
Box2D/Box2D/Collision/Shapes/b2Shape.h
Box2D/Box2D/Collision/b2BroadPhase.cpp
Box2D/Box2D/Collision/b2BroadPhase.h
Box2D/Box2D/Collision/b2CollidePolygon.cpp
Box2D/Box2D/Collision/b2Collision.cpp
Box2D/Box2D/Collision/b2Collision.h
Box2D/Box2D/Collision/b2Distance.cpp
Box2D/Box2D/Collision/b2Distance.h
Box2D/Box2D/Collision/b2DynamicTree.cpp
Box2D/Box2D/Collision/b2DynamicTree.h
Box2D/Box2D/Common/b2Math.h
Box2D/Box2D/Common/b2Settings.cpp
Box2D/Box2D/Common/b2Settings.h
Box2D/Box2D/Dynamics/Contacts/b2Contact.cpp
Box2D/Box2D/Dynamics/Contacts/b2Contact.h
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.cpp
Box2D/Box2D/Dynamics/Contacts/b2ContactSolver.h
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.cpp
Box2D/Box2D/Dynamics/Contacts/b2PolygonContact.h
Box2D/Box2D/Dynamics/b2Body.cpp
Box2D/Box2D/Dynamics/b2Body.h
Box2D/Box2D/Dynamics/b2ContactManager.cpp
Box2D/Box2D/Dynamics/b2ContactManager.h
Box2D/Box2D/Dynamics/b2Fixture.cpp
Box2D/Box2D/Dynamics/b2Fixture.h
Box2D/Box2D/Dynamics/b2Island.cpp
Box2D/Box2D/Dynamics/b2Island.h
Box2D/Box2D/Dynamics/b2World.cpp
Box2D/Box2D/Dynamics/b2World.h
Darn. Simply reverting the b2_linearSlop
change of commit ea5f3da back to Erin's 0.005f
value and reverting b2_baumgarte
to Erin's 0.2f
value made the Tumbler demo look just like Erin's again. So either somehow the current library code isn't defaulting to these values close enough or I'd introduced more than one set of changes that make the demo look more squishy than Erin's.
I'll summarize the relevant changes I see for this commit.
-constexpr auto b2_linearSlop = float32{0.005f};
+constexpr auto b2_linearSlop = float32{0.00005f}; // float32{0.005f};
The new value is 100 times smaller. Without any other changes I would suspect this would improve the contact resolution at the expense of additional position iterations.
Line 79 (unchanged):
constexpr auto b2_polygonRadius = float32{2.0f * b2_linearSlop};
Just keeping in mind the b2_polygonRadius=0.0001
; decreased by a factor of 100.
-constexpr auto b2_maxLinearCorrection = float32{0.2f};
+constexpr auto b2_maxLinearCorrection = b2_linearSlop * 40; // float32{0.2f};
Given this change, and the above change to b2_linear_slop
, looks like b2_maxLinearCorrection=0.002
; decreased by a factor of 100.
-return minSeparation >= -3.0f * b2_linearSlop;
+return minSeparation >= (-3.0f * -b2_linearSlop);
You're flipping the sign on this calculation, I'm assuming you've now redefined 'separation' as positive and 'penetration' as negative. Mathematically, when you flip the signs in an inequality, you also have to reverse the greater than / less than symbol. Might you have forgotten to do that in this line of code? The equation returns the opposite value I'd expect it to. Could it be that you missed flipping the signs in another calculation? The absolute value of the RHS has also decreased by a factor of 100.
I can keep listing changes (and the eventual consequences), but I have to ask first: what is the reasoning for fiddling with the b2_linear_slop
? Something to do with your rounded collision boxes perhaps?
@NauticalMile64 Thanks for looking at this and being a second pair of eyes!
Yes, the b2_linearSlop
was 100 times smaller. When I got commit ea5f3da590891af0bef789bdaa7a4737e051339a built just to run the Tumbler Testbed demo and then reverted that value (and b2_baumgarte
), things went back to normal. That's what I originally was expecting: that simply reverting defaults like these back to Erin's Box2D values would fix things. And it did in my testing of that commit. Unfortunately, the current code has the same values now as in Erin's Box2D code but the Tumbler demo -- while less squishy than it was in this commit -- is still squishier than in Erin's demo. So I figure I must have introduced other changes that also effected squishiness or I've somewhere not gotten the reverted value back in place.
Interesting catch you found with both signs being negative there! I hadn't noticed it when I was reviewing these changes a couple days ago. I see that I wound up fixing this problem the day after that commit however in commit a26205ee19daf06b9167a0d0862561572cf22a7a (Apr 6, 2016).
As for what the reasoning is for fiddling with values like b2_linearSlop
, I love that question! There have been a few reasons why I've done so:
StepConf
).nextafter
was past that half).There may be more reasons but those are what come to mind at the moment.
As for continuing to review changes and analyzing their consequences, that may well be what it takes to resolve this. Know that I continue to work on this however and hope to be updating code soon towards solving this issue.
Bisecting again now with the modification of the linear slop and max linear correction default settings in hand...
git bisect start
git bisect bad
git bisect good ea5f3da
Then... 03e0b4cb (Oct 31, 2016) was good. 627197b3 (Nov 8, 2016) was good. a24e8d3f (Nov 12, 2016) was good. 35823571 (Nov 16, 2016) was good. 1c718171 () was good. 5bd79274 was good. 97c9f424 was good. Maybe 4b278af23edf454d005f0d6952fb0b4df05e43e2 first bad.
After restarting bisection again: e7b5d15d (Apr 27, 2017) bad. 1b0caec0 (Feb 26, 2017) bad. a8b1c025 (Jan 9 15:41:54 2017) bad. 744a6b74 bad. fdcf3077 bad (Jan 7 10:41:17 2017). 5ceab2a1 (Dec 6 2016) good. e2731daa (Dec 20 11:57:34 2016) good. 5e0e45ff good. c0f02b64 good. 89c728a7 good (Jan 7 10:35:23 2017). fdcf3077fbcf476db5d1947046cba3257362581d is the first bad commit
This first "bad" commit only changes code Box2D/Box2D/Collision/CollideShapes.cpp
.
git diff -w 89c728a7 fdcf3077 -- Box2D/Box2D/Collision/CollideShapes.cpp
Cool. I know what the exact change is now in this file. Not sure yet what to do about it.
Better than ever!!!...
Looks great!
I see there are a lot of changes that went into those commits. Would you mind summarizing your thought process and the key things you changed? It's really helpful for everyone to have this information when issues like this arise in the future.
Woot! Love the question @NauticalMile64 !
The solution boiled down to missing less opportunities to "warm start the solver" with last known manifold point impulse information. Critical to this were the changes to Contact.cpp
. Secondarily were the changes to Collision.cpp
. I'd added comments in the source code too about this to at least help remind me about what I did and why. These can be seen in the linked file changes.
All the other changes, were really just cosmetic or just dealing with code completeness for some corner cases (like getting the code to compile again with Real
set to Fixed32
) or slight optimizations.
From @louis-langholtz on January 7, 2017 22:42
Here's a screenshot showing the current state and issue:
Copied from original issue: louis-langholtz/Box2D#3