stephengold / Minie

Integrate Bullet Physics and V-HACD into jMonkeyEngine projects (code has New BSD license)
https://stephengold.github.io/Minie/minie/overview.html
Other
123 stars 22 forks source link

BoundingValueHierarchy serialization data not portable #42

Open stephengold opened 6 months ago

stephengold commented 6 months ago

This issue dates back to Minie v0.7.2, when a workaround was implemented that discards serialized BVH data if the writer and reader were on different platforms.

The root cause is in Bullet's btQuantizedBvh.h, where the size of class btQuantizedBvh varies between platforms. On Linux platforms (GCC compiler, both on x86_64 and arm64), the following assertions pass:

#ifdef BT_USE_DOUBLE_PRECISION
        btAssert(sizeof(btQuantizedBvh) == 296);
#else
        btAssert(sizeof(btQuantizedBvh) == 248);
#endif

On macOS (LLVM compiler) and Windows (Visual compiler) the same assertions often fail. Inferred from bvh.serialize() return values:

Perhaps some padding is needed for data portability.

stephengold commented 6 months ago

To dump the layout of class btQuantizedBvh, I embedded a small test app into the Libbulletjme JVM library.

Single-precision natives

For linux64/debug/sp:

btQuantizedBvh size is 248 bytes:
 m_bvhAabbMin at +8 bytes
 m_bvhAabbMax at +24 bytes
 m_bvhQuantization at +40 bytes
 m_bulletVersion at +56 bytes
 m_curNodeIndex at +60 bytes
 m_useQuantization at +64 bytes
 m_leafNodes at +72 bytes
 m_contiguousNodes at +104 bytes
 m_quantizedLeafNodes at +136 bytes
 m_quantizedContiguousNodes at +168 bytes
 m_traversalMode at +200 bytes
 m_SubtreeHeaders at +208 bytes
 m_subtreeHeaderCount at +240 bytes

For macOSX64/debug/sp:

btQuantizedBvh size is 256 bytes:
 m_bvhAabbMin at +16 bytes
 m_bvhAabbMax at +32 bytes
 m_bvhQuantization at +48 bytes
 m_bulletVersion at +64 bytes
 m_curNodeIndex at +68 bytes
 m_useQuantization at +72 bytes
 m_leafNodes at +80 bytes
 m_contiguousNodes at +112 bytes
 m_quantizedLeafNodes at +144 bytes
 m_quantizedContiguousNodes at +176 bytes
 m_traversalMode at +208 bytes
 m_SubtreeHeaders at +216 bytes
 m_subtreeHeaderCount at +248 bytes

For macOSX_ARM64/debug/sp:

btQuantizedBvh size is 256 bytes:
 m_bvhAabbMin at +16 bytes
 m_bvhAabbMax at +32 bytes
 m_bvhQuantization at +48 bytes
 m_bulletVersion at +64 bytes
 m_curNodeIndex at +68 bytes
 m_useQuantization at +72 bytes
 m_leafNodes at +80 bytes
 m_contiguousNodes at +112 bytes
 m_quantizedLeafNodes at +144 bytes
 m_quantizedContiguousNodes at +176 bytes
 m_traversalMode at +208 bytes
 m_SubtreeHeaders at +216 bytes
 m_subtreeHeaderCount at +248 bytes

For windows64\debug\sp:

btQuantizedBvh size is 256 bytes:
 m_bvhAabbMin at +16 bytes
 m_bvhAabbMax at +32 bytes
 m_bvhQuantization at +48 bytes
 m_bulletVersion at +64 bytes
 m_curNodeIndex at +68 bytes
 m_useQuantization at +72 bytes
 m_leafNodes at +80 bytes
 m_contiguousNodes at +112 bytes
 m_quantizedLeafNodes at +144 bytes
 m_quantizedContiguousNodes at +176 bytes
 m_traversalMode at +208 bytes
 m_SubtreeHeaders at +216 bytes
 m_subtreeHeaderCount at +248 bytes

Double-precision natives

For linux64/debug/dp:

btQuantizedBvh size is 296 bytes:
 m_bvhAabbMin at +8 bytes
 m_bvhAabbMax at +40 bytes
 m_bvhQuantization at +72 bytes
 m_bulletVersion at +104 bytes
 m_curNodeIndex at +108 bytes
 m_useQuantization at +112 bytes
 m_leafNodes at +120 bytes
 m_contiguousNodes at +152 bytes
 m_quantizedLeafNodes at +184 bytes
 m_quantizedContiguousNodes at +216 bytes
 m_traversalMode at +248 bytes
 m_SubtreeHeaders at +256 bytes
 m_subtreeHeaderCount at +288 bytes

For macOSX64/debug/dp:

btQuantizedBvh size is 296 bytes:
 m_bvhAabbMin at +8 bytes
 m_bvhAabbMax at +40 bytes
 m_bvhQuantization at +72 bytes
 m_bulletVersion at +104 bytes
 m_curNodeIndex at +108 bytes
 m_useQuantization at +112 bytes
 m_leafNodes at +120 bytes
 m_contiguousNodes at +152 bytes
 m_quantizedLeafNodes at +184 bytes
 m_quantizedContiguousNodes at +216 bytes
 m_traversalMode at +248 bytes
 m_SubtreeHeaders at +256 bytes
 m_subtreeHeaderCount at +288 bytes

For macOSX_ARM64/debug/dp:

btQuantizedBvh size is 296 bytes:
 m_bvhAabbMin at +8 bytes
 m_bvhAabbMax at +40 bytes
 m_bvhQuantization at +72 bytes
 m_bulletVersion at +104 bytes
 m_curNodeIndex at +108 bytes
 m_useQuantization at +112 bytes
 m_leafNodes at +120 bytes
 m_contiguousNodes at +152 bytes
 m_quantizedLeafNodes at +184 bytes
 m_quantizedContiguousNodes at +216 bytes
 m_traversalMode at +248 bytes
 m_SubtreeHeaders at +256 bytes
 m_subtreeHeaderCount at +288 bytes

For windows64\debug\dp:

btQuantizedBvh size is 304 bytes:
 m_bvhAabbMin at +16 bytes
 m_bvhAabbMax at +48 bytes
 m_bvhQuantization at +80 bytes
 m_bulletVersion at +112 bytes
 m_curNodeIndex at +116 bytes
 m_useQuantization at +120 bytes
 m_leafNodes at +128 bytes
 m_contiguousNodes at +160 bytes
 m_quantizedLeafNodes at +192 bytes
 m_quantizedContiguousNodes at +224 bytes
 m_traversalMode at +256 bytes
 m_SubtreeHeaders at +264 bytes
 m_subtreeHeaderCount at +296 bytes

The key difference: on Linux the first field starts at +8, while on Windows it starts at +16.

macOS appears inconsistent:

I'd love to know why.

Since the correct amount of padding seems to be "it's complicated", I should also test Linux-on-Arm, both 32-bit and 64-bit. Unfortunately, I don't have access to 32-bit platforms (other than Linux-on-Arm) or to Android.

stephengold commented 6 months ago

I hacked together a Clang build, and it matched Gcc perfectly:

For linux64/debug/sp:

btQuantizedBvh size is 248 bytes:
 m_bvhAabbMin at +8 bytes
 m_bvhAabbMax at +24 bytes
 m_bvhQuantization at +40 bytes
 m_bulletVersion at +56 bytes
 m_curNodeIndex at +60 bytes
 m_useQuantization at +64 bytes
 m_leafNodes at +72 bytes
 m_contiguousNodes at +104 bytes
 m_quantizedLeafNodes at +136 bytes
 m_quantizedContiguousNodes at +168 bytes
 m_traversalMode at +200 bytes
 m_SubtreeHeaders at +208 bytes
 m_subtreeHeaderCount at +240 bytes

For linux64/debug/dp:

Debug_Dp_Libbulletjme version 21.0.0 initializing

btQuantizedBvh size is 296 bytes:
 m_bvhAabbMin at +8 bytes
 m_bvhAabbMax at +40 bytes
 m_bvhQuantization at +72 bytes
 m_bulletVersion at +104 bytes
 m_curNodeIndex at +108 bytes
 m_useQuantization at +112 bytes
 m_leafNodes at +120 bytes
 m_contiguousNodes at +152 bytes
 m_quantizedLeafNodes at +184 bytes
 m_quantizedContiguousNodes at +216 bytes
 m_traversalMode at +248 bytes
 m_SubtreeHeaders at +256 bytes
 m_subtreeHeaderCount at +288 bytes
stephengold commented 6 months ago

I built Libbulletjme on a Raspbian with Gcc-8 (hash 0fb79b5):

For linux_ARM32/debug/sp:

btQuantizedBvh size is 180 bytes:
 m_bvhAabbMin at +12 bytes
 m_bvhAabbMax at +28 bytes
 m_bvhQuantization at +44 bytes
 m_bulletVersion at +60 bytes
 m_curNodeIndex at +64 bytes
 m_useQuantization at +68 bytes
 m_leafNodes at +72 bytes
 m_contiguousNodes at +92 bytes
 m_quantizedLeafNodes at +112 bytes
 m_quantizedContiguousNodes at +132 bytes
 m_traversalMode at +152 bytes
 m_SubtreeHeaders at +156 bytes
 m_subtreeHeaderCount at +176 bytes

For linux_ARM32/debug/dp:

btQuantizedBvh size is 232 bytes:
 m_bvhAabbMin at +16 bytes
 m_bvhAabbMax at +48 bytes
 m_bvhQuantization at +80 bytes
 m_bulletVersion at +112 bytes
 m_curNodeIndex at +116 bytes
 m_useQuantization at +120 bytes
 m_leafNodes at +124 bytes
 m_contiguousNodes at +144 bytes
 m_quantizedLeafNodes at +164 bytes
 m_quantizedContiguousNodes at +184 bytes
 m_traversalMode at +204 bytes
 m_SubtreeHeaders at +208 bytes
 m_subtreeHeaderCount at +228 bytes
stephengold commented 6 months ago

The Linux_ARM32 dump exposes yet another hurdle to portability: the layout of fields in btQuantizedBvh differs between 64-bit platforms (where pointers are 8 bytes) and 32-bit platforms (where pointers are 4 bytes). I believe additional padding of btQuantizedBvh on 32-bit platforms could address these layout differences.

If the goal is robust portability, however, Libbulletjme's total reliance on btOptimizedBvh::deSerializeInPlace() must end. While Bullet does a good job of ensuring byte-order portability (and that's the least of my worries), inserting padding on a per-platform basis is not a great solution for different pointer sizes and/or first-field offsets. For one thing, padding will break backward compatibility with old models (such as those saved using Minie v8.0.0). For another, deSerializeInPlace() isn't portable between single-precision natives and double-precision natives. And finally, I can't easily determine the correct amount of padding for 7 of the 13 platforms Minie claims to support.

Conclusion: Minie needs access to Bullet's "new" serialization system, the one that uses btSerializer, leaving deSerializeInPlace() as a (fragile and risky) fallback for loading old models.

stephengold commented 6 months ago

I've been struggling to grok the examples in Bullet's "Extras/Serialize" folder.

My first serialization of the test model using btSerializer resulted in 16,204 bytes of data, a huge increase over the 2,480 bytes generated using in-place serialization. 13,784 of those bytes are (Blender-related?) "DNA" that Minie doesn't need. Apparently Bullet doesn't provide for serialization without adding DNA to the buffer, so I'm going to comment out the writeDNA() call in btSerializer::finishSerialization().

stephengold commented 6 months ago

Bullet's DNA is essential to porting Bullet-serialized data structures between platforms. It encodes the field offsets of 80+ classes and structs, including 6 defined in "btQuantizedBvh.h".

I begin to question the benefits of serializing and deserializing the BVH of a MeshCollisionShape. I don't recall ever measuring the performance benefit of this feature. Perhaps it's only beneficial for certain meshes (I'm imagining very large ones), in which case serializing BVH should be optional in Minie (as it is in Bullet).

If it's beneficial, then the current approach (re-generating BVH during load if the platforms don't match) seems optimal for applications that run on one platform only.

For apps that benefit from BVH serialization and run on multiple platforms, I see several approaches:

  1. Serialize Bullet DNA for every BVH. This would insure against future changes to the native structs, though further changes seem unlikely at this point. The cost would be about 14KB added to each serialized MeshCollisionShape.
  2. Construct a smaller DNA (sufficient for de-serializing BVH only) and serialize that instead.
  3. Write new BVH serialization code in C++ that is both portable and efficient.
  4. Add accessors and constructors to BoundingValueHierarchy so it can be serialized and deserialized using JME's system, the way other Minie classes are. I worry this approach might prove inefficient. I recently added 12 accessors for debugging and testing purposes. I estimate 12 more accessors and one more constructor would be needed for this approach.
stephengold commented 6 months ago

I made BVH serialization optional at 720c1ec9726cbdb42ede6bd507f58ea429c5c65f. BVH serialization remains platform-specific for now, but I'll leave this issue open.