jrouwe / JoltPhysics

A multi core friendly rigid body physics and collision detection library. Written in C++. Suitable for games and VR applications. Used by Horizon Forbidden West.
MIT License
6.56k stars 425 forks source link

Double-precision support? #94

Closed Zylann closed 1 year ago

Zylann commented 2 years ago

Does Jolt supports double-precision? I tried Godot Engine physics and it does support it, I was able to simulate physics normally while 1000Km away from the origin. However, my use case involves creating a lot of mesh colliders at runtime and I'm a concerned about several performance issues with Godot's physics engine, so I was considering giving Jolt a go at some point.

jrouwe commented 2 years ago

No, Jolt doesn't support double precision. At this moment you have to ensure that your simulation takes place in roughly 2 km from the origin (I haven't seen any issues at 4 km either, it hasn't been tested with scenes larger than that).

In the future I'm planning to store positions at a higher precision so you can have larger worlds (everything else in the engine will remain float).

Ansraer commented 2 years ago

Been a while since I last worked on something large enough to run into precision problems, but shouldn't 32 bit floats be good enough for up to around 10 km? Or do you need more than mm precision?

Zylann commented 2 years ago

So here is more context:

My actual use case is about simulating in a planet that can extend about 30Km or more from the origin. 10Km is actually quite high, I thought the "limit" of acceptable chaos is within 4Km only for complex simulations (for a unit of space being 1m). While I don't plan for bodies to physically interact across such large distances, this would be the same world, with static bodies possibly present all around and bodies that can move through such distances to eventually meet as well. So when I say "double precision", I dont really care if doubles aren't being used, but rather the ability to simulate at these scales (doubles are just very convenient to handle large coordinates).

The full scale of the world is actually way larger, it's a solar system with multiple planets, however because of celestial motion it would be expensive and unstable to have all planets be dynamic bodies. So I thought I could separate space from planets so they can be static when close enough. I currently do this in Godot Engine at a much lower scale via a "change of physics world". Space has low interaction (for now) so it has no chaos issues even though it has large distances (the sun is the origin). Planets are smaller yet still fairly big, and most of the environment in them can be static because their "world" have an origin centered on them instead of the sun. The game is seamless though. Space/planet separation was relatively easy to do due to the low interaction happening there, but it's less obvious while close to ground where plenty of bodies could be interacting. I'm also not sure yet how to handle interactions that could still occur at the frontier between space and planet (if two player ships meet there for example). Early version of a small-scale prototype here

jrouwe commented 2 years ago

Accuracy of a float is about 0.5 mm at 8 km. If you want to collide two 'meter scale' objects at more than 8 km from the origin, subtracting the positions of the bodies is going to give an error of 0.5 mm, so the collision detection / response will be off by 0.5 mm and all the constraints will see this error of 0.5 mm too. This may still be acceptable, but it's not going to help the stability of the simulation. As said, I don't know exactly where the breaking point is. I think it will also depend on the amount of constraints that you use and the complexity of the geometry.

If you want to simulate a solar system, then you can get away with a much lower precision (it doesn't matter if the position of a planet is off by 1 mm), so if you have a physics system with objects that are 100's of meters in size and that are many kilometers apart then things may just work as you expect (as long as you don't try to walk on the surface). Beware though that there are quite a few 'epsilons' in the code and e.g. the StaticCompoundShape stores data in half-float so the max value you can encode is 65 km (at a very low precision).

Zylann commented 2 years ago

If you want to simulate a solar system, then you can get away with a much lower precision

I can't, because I can fly from space down to planet ground human-scale, and have some usual stuff like vehicles and buildings going on there.

DocAce commented 2 years ago

I recommend using a localized coordinate system. It's what we do in the space game I'm working on, which also has person-scale interactions (though no physics simulations requiring great accuracy, for us coordinates of up to 50km still give acceptable accuracy). There are a bunch of ways to achieve this, I'd be happy to talk shop about this problem :)

Zylann commented 2 years ago

So you recommend origin shifting? I thought it would be possible to not have to go down this burden... because right now doing that in Godot Engine almost always means a big freeze (tons of objects to go through, especially in a game with building mechanics) + having to account for it in all systems (not even thinking about multiplayer)... notably engine systems that I'm not really willing to fork to have this in because that stuff seems like a can of worms to support from scratch, also considering Godot is only starting to supports doubles, seemed easier^^" why all this love for floats xD And how would localized coordinate systems work, when there are bodies simultaneously simulating on the same planet? It can't shift the whole world (unless maybe you assume each client has only one island + physics are client-authoritative and not done on a server).

jrouwe commented 2 years ago

You could have multiple PhysicsSystems: 1 per planet and 1 for outer space. The planet PhysicsSystem would be fixed to a planet and all objects on that planet would be simulated by it. When you get close to a planet, you move the ship from the 'outer space' PhysicsSystem into the planet's PhysicsSystem (at that point you'd need to recalculate the position of the ship relative to the system you're going to). You'd only need to simulate PhysicsSystems for planets that are close to the player.

B.t.w. the love for floats is mainly because it uses less memory and SIMD operations are a lot faster on floats than on doubles (and the instructions are available on older hardware). I will get around to supporting doubles someday, but it's quite a big effort.

Zylann commented 2 years ago

You could have multiple PhysicsSystems: 1 per planet and 1 for outer space. The planet PhysicsSystem would be fixed to a planet and all objects on that planet would be simulated by it. When you get close to a planet, you move the ship from the 'outer space' PhysicsSystem into the planet's PhysicsSystem (at that point you'd need to recalculate the position of the ship relative to the system you're going to). You'd only need to simulate PhysicsSystems for planets that are close to the player.

This is currently what I do at small scale, although in the prototype I don't actually use multiple worlds (aka PhysicSystem) yet, it is translating everything of the same world while in space (cheap because few elements in that situation, but will likely not remain this way in the future). I initially wanted to do that on a larger scale, problem is I fear 30-60Km or more might introduce more chaos than acceptable for this game while on planet.

Yeah unfortunately I suspected floats are more performance-friendly for those reasons. GodotPhysics currently can use doubles which works fine but maybe I'm not seeing yet the lack of performance that could be gained by using floats instead^^ (and performance is why I'm looking at alternatives in the first place)

jclc commented 2 years ago

Would this require any significant effort aside from adding a typedef for the used float type and replacing the currently used float type across the files?

jrouwe commented 2 years ago

Adding a typedef would be one way of implementing double support, but I don't think that it is the most efficient way. Basically it would double the size of many of the (carefully tuned) data structures and it would come at a performance cost as well. Also it's not needed to store everything in doubles. What you basically want is to store positions of bodies, starting points of raycasts etc. in doubles and nothing else. The trick is to drop down to float as soon as possible, so e.g. when colliding two bodies you subtract their positions and then drop down to float to do the remainder of the collision detection in floats.

TheophilusE commented 2 years ago

Hey @jrouwe I wanted to ask if this is something that you plan on working on anytime in the near future? Thanks.

jrouwe commented 2 years ago

I am, but to do it right is a fair amount of work so I don't know when I'll have the time. What do you plan to use it for? (it only makes sense if you have extremely large worlds, the simulation of small worlds is not going to become more physically accurate with doubles, there are too many approximations in a physics engine for that)

TheophilusE commented 2 years ago

What I was hoping to use this for was large scale simulations where things like positions, raycast, etc. Need to be represented by either by doubles or floating point hacks.

alienself commented 1 year ago

+1 for this feature

velifaro commented 1 year ago

I did an experiment: changed everything to double. Also I had to remove the bit magic in some places. It's great that you are using tests and static checks, it helped me a lot. Certainly I got a significant performance hit. I would also like to have a mode where the positions of objects and query requests would be in doubles but the main calculations would be in the local coordinate system in floats. Maybe It is possible to use several overlapped scenes with local coordinates but I don't understand how to manage objects which have contacts between scenes.

jrouwe commented 1 year ago

Interesting! Can you post the before and after numbers for the PerformanceTest (ragdoll version)? And are you willing to share your code changes (and are you ok with me borrowing from it?)?

W.r.t. overlapping scenes: This is indeed quite hard. It really depends on your scene in this case. Perhaps it's possible to treat a dynamic object of one system as a kinematic object in the other system. Perhaps you can have a bit of overlap in worlds (in terms of static objects) and can temporarily move a dynamic object from 1 system to the other when you detect interaction. Or maybe there's a way to split up the world in such a way that there are no overlapping dynamic object in the first place.

jankrassnigg commented 1 year ago

I was wondering, how did you deal with large worlds in Horizon? Did you teleport everything at regular intervals?

jrouwe commented 1 year ago

I was wondering, how did you deal with large worlds in Horizon? Did you teleport everything at regular intervals?

The world is not big enough to cause any issues.

velifaro commented 1 year ago

Interesting! Can you post the before and after numbers for the PerformanceTest (ragdoll version)? And are you willing to share your code changes (and are you ok with me borrowing from it?)?

I made fork with double precision. The code is dirty now and I suppressed a lot of asserts due to precision problems. My results of PerformanceTest for doubles and for original code:


Running scene: Ragdoll Motion Quality, Thread Count, Steps / Second, Hash Discrete, 4, 66.1794, 0x49b1c31d176ef0b2 LinearCast, 4, 59.9925, 0x9f7f5d5bec6a6ee1


Running scene: Ragdoll Motion Quality, Thread Count, Steps / Second, Hash Discrete, 4, 124.375, 0x64f6913a8a94f143 LinearCast, 4, 116.889, 0x26b68d7f20974bc6


It looks not so awful. But when I did raycast test on my scene it shown degradation around 3-10 times depends of rays direction.

TheophilusE commented 1 year ago

Not everything needs to be doubles, only that which affect the world position.

jrouwe commented 1 year ago

I made fork with double precision. The code is dirty now and I suppressed a lot of asserts due to precision problems. My results of PerformanceTest for doubles and for original code:

Ok, that's indeed a way of doing it (rename float -> double and disable the intrinsics) and it ends up with interesting class names like HalfDouble :). Anyway, I'll start working on this and will selectively move things to double. Hopefully that will end up not costing 50% of the performance.

jrouwe commented 1 year ago

Hello,

I'm making good progress. Take a look at https://github.com/jrouwe/JoltPhysics/tree/feature/double_precision.

The simulation is mostly done and the main thing that's left are the ray casts and cast shape queries. There's 77 TODO_DP tags left that I need to sort out and the interface needs to change slightly here and there. All demos are working and there's a new 'Big World' test that shows the difference between single and double precision in terms of stability.

Some initial performance results for the ragdoll performance test:

image

'Original code' is the code that is currently on 'master', 'Single precision' is the new code compiled without JPH_DOUBLE_PRECISION and 'Double precision' is the new code compiled with JPH_DOUBLE_PRECISION. It looks like the performance loss from switching to doubles is very little and for a reason unknown to me it seems that it is even faster to use doubles at high thread counts. In general the difference is a couple percent so could be measurement error.

I managed to keep almost all code running in floats and the basic idea behind the collision queries is that for each of them you specify a 'base offset'. All returned collisions are relative to this base offset which allows you to keep them close to the origin and in the range where floats are accurate. This means you have to pick the base offset smartly but usually it's quite obvious, e.g. for a CollideShape you can either pick the position of the query body or the target body. CharacterVirtual does everything relative to its own position. This way of doing things should actually also increase accuracy for a simulation that's not using doubles.

TheophilusE commented 1 year ago

Perfect! Will run tests on my machine. Is your test framework available in the repository?

jrouwe commented 1 year ago

Yes, it has been updated too

jrouwe commented 1 year ago

I have just merged #344 into master which implements double precision mode! As it's a rather large change (299 files changed), I hope I didn't break anything. Let me know what you think.

mihe commented 1 year ago

I was curious to see what the final performance impact ended up being, and it seems to line up with the 10-20% mentioned in Docs/Architecture.md.

These are the averaged results from 3 runs that I got from PerformanceTest on my Windows 11 machine (AMD 3700X, 32GB RAM, USE_AVX2) with the Distribution configuration.

⚠️ EDIT: Note that they're normalized per-chart, not globally, in case that wasn't clear.

PerformanceTest_Discrete

PerformanceTest_LinearCast

Interestingly you can (on Windows) pretty much cancel out any performance loss from double-precision, at least in this particular benchmark, by simply switching compilers. I'd be curious to know what the reason for the big discrepancy between compilers is.

MSVC seems to take a particularly big hit from double-precision during the LinearCast runs.

jrouwe commented 1 year ago

Interesting! I think I have some Superluminal profiling ahead of me.

jrouwe commented 1 year ago

So I did a little bit of profiling. I compiled Clang and MSVC in doubles using CROSS_PLATFORM_DETERMINISTIC and verified that the hashes both versions produce are identical (otherwise it's quite possible that for one simulation objects sleep sooner skewing the whole thing).

I looked at 2 hotspots SwingTwistConstraint::SolveVelocityConstraint and ContactConstraintManager::GetContactsFromCache and saw that Clang does more aggressive inlining than MSVC. MSVC ignores certain functions that I marked as 'inline'. It's possible that there's a compiler setting to tweak this, I have to investigate. In any case I force inlined the functions MSVC was missing out and that made it a bit faster but not much.

Going through the assembly generated by both compilers is quite tedious as it is very math heavy and both compilers properly inlined all the math. The code looks quite similar between them but Clang manages to generate less code than MSVC (280 instructions vs 366 instructions). I can't quite see why MSVC needed all these extra instructions but it looks like Clang managed to eliminate a couple of branches in the inner loop (probably because the same condition is used in multiple pieces of inlined code). Needs some further investigation...

jrouwe commented 1 year ago

In case you want to take a look yourself, here are the two functions.

clang.txt msvc.txt

jrouwe commented 1 year ago

Small update: I fixed a bug that caused performance penalty for the double precision mode in f7d19a91dae3f363ade33225b03476f940c59e33. I've measured the performance difference between single and double precision after this fix to be 5-10% (instead of 10-20%).

mihe commented 1 year ago

Here are some new graphs, using f7d19a91dae3f363ade33225b03476f940c59e33. Same setup as last time, except I also used CROSS_PLATFORM_DETERMINISTIC this time around, so comparing against the previous graphs will be a bit skewed.

PerformanceTest_Discrete

PerformanceTest_LinearCast

They do seem to confirm the lesser 5-10% performance hit, but it still seems like the choice of compiler holds more weight than the choice of floating-point precision, at least for this particular benchmark.

Also interesting to see double-precision being faster in the LinearCast runs on both compilers now. I guess that could also be a consequence of using CROSS_PLATFORM_DETERMINISTIC.