Loading overworld is SLOW

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria

Since in story mode the player will visit the overworld often, it needs to load much faster.

A quick profiling session showed that ~66% of the loading time is spent in TriangleMesh::createCollisionShape, more specifically in new btBvhTriangleMeshShape, and even more specifically in btQuantizedBvh::buildTree. (the rest of the loading time is loading the textures and B3D files)

The btQuantizedBvh::unQuantize function stands out as taking over 14% of the loading time just on its own.

A few ideas to make it faster. 1) Don't recalculate this everytime the overworld is shown, since we will visit it often; instead cache the result and reuse it on next visit. This requires special-casing the overworld but should be quite doable. The initial load time would still be slow though. 2) Pre-calculate the data and serialize it in a file, if bullet supports that (I have seen some serialization support in bullet but not sure about btBvhTriangleMeshShape) 3) Create a simplified mesh for the physics only, so that it has less data to place in the tree. PAINFUL for the artist. 4) Ask the bullet people if there is something else that can be done ^^

Migrated-From: https://sourceforge.net/apps/trac/supertuxkart/ticket/546

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria Looking at bullet docs, we should be able to serialize both the btTriangleMesh and the btOptimizedBvh, which would then enable us to create a btBvhTriangleMeshShape with parameter buildBvh=false and then call setOptimizedBvh, which hopefully would be faster than building the bhv on the fly.

Adding Joerg to CC, Joerg do you think this is feasible without too much trouble?

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria OK I think the profiling is skewed, it seems like the profiler does not count time spent where the process is sleeping waiting for the hard disk

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria OK I did another profiling based on time and not CPU use. More accurate numbers i hope. Bullet still takes a lot of time in there

'''Summary'''

new btBvhTriangleMeshShape : 5.9 s pushTempMaterial : 1.3 s load main mesh : 1.1 s load other meshes : 0.6 s convertTrackToBullet : 0.25 s

Total : 9.6 s

'''Detailed'''

World::init { // took 9.599854 s
    loadTrackModel() { // took 9.511963 s
        read XML nodes { // took 5.326172 s
            pushTempMaterial { // took 1.311035 s
            }
            create XML tree { // took 0.024902 s
            }
            create quad graph { // took 0.041016 s
            }
            setDefault start positions { // took 0.000000 s
            }
            get sun & fog info { // took 0.000000 s
            }
            load main track { // took 3.862061 s
                getMesh { // took 1.095947 s
                }
                create batching mesh { // took 0.013916 s
                }
                addOctTree { // took 0.077148 s
                }
                grab all textures { // took 0.000977 s
                }
                handle animated textures { // took 0.000977 s
                }
                minMax3D { // took 0.010010 s
                }
                getPhysics()->init { // took 0.002930 s
                }
                load nodes { // took 0.655029 s
                }
                loadMainTrack.convertTrackToBullet { // took 0.140137 s
                }
                createPhysicalBody && createCollisionShape { // took 1.854980 s
                    btBvhTriangleMeshShape { // took 1.854980 s
                    }
                }
            }
            new CheckManager { // took 0.000000 s
            }
            create sky { // took 0.045166 s
            }
        }
        finish LOD {
        } // took 0.000000 s
        init track objects {
        } // took 0.000977 s
        adjustForFog {
        } // took 0.002930 s
        sky/sun/light {
        } // took 0.000000 s
        createPhyshicsModel.convertTrackToBullet {
        } // took 0.111084 s
        btBvhTriangleMeshShape {
        } // took 4.066895 s
        checkline requirements {
        } // took 0.000000 s
    }
}

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria Ok initial implementation committed in r10583. For the overworld, this reduced the 'btBvhTriangleMeshShape' time for the main mesh from 4.067 s to 0.15 s, which seems a great success so far :) Though we would need to figure out a way to generate the BVH file that is not too painful, and we might want to do the same for all track nodes, not just the main mesh.

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria Ok I had many people on IRC do tests. For some people it loads fine, for others bullet refuses to load. I am 99% sure it is because bullet just uses 'int' all over the place, which under some systems is 32 bits and on others 64 bits, which would of course cause issue when reading a file written using a different size of int :'(

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria Redskull reported that the error message is

[btQuantizedBvh::deSerializeInPlace] ERROR: expected 1635783352 bytes, got 3804880!

I am confused, I don't think 32 bits vs 64 bits can account for that big a difference...

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria Ok I got redskull to print some variables. The result is

    sizeof(btQuantizedBvh) = 248
    sizeof(btQuantizedBvhNode) = 16
    getAlignmentSerializationPadding() = 0
    m_subtreeHeaderCount = 51118222
    sizeof(btBvhSubtreeInfo) = 32
    m_curNodeIndex = 0

so quite obviously m_subtreeHeaderCount is wrong

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria ok the problem is that the size of things vary from computer to computer, and so m_subtreeHeaderCount is read incorrectly :(

= ME =
sizeof(btQuantizedBvh) = 192
sizeof(btQuantizedBvhNode) = 16
getAlignmentSerializationPadding() = 0
m_subtreeHeaderCount = 0xa47
sizeof(btBvhSubtreeInfo) = 32
m_curNodeIndex = 232531
m_bulletVersion = 0x117
sizeof(NodeArray) = 20

= KroArtem =
sizeof(btQuantizedBvh) = 248
sizeof(btQuantizedBvhNode) = 16
getAlignmentSerializationPadding() = 0
m_subtreeHeaderCount = 0x30c008e
sizeof(btBvhSubtreeInfo) = 32
m_curNodeIndex = 0
m_bulletVersion = 0x429a1fc2
sizeof(NodeArray) = 32
[btQuantizedBvh::deSerializeInPlace] ERROR: expected 1635783352 bytes, got 3804880!

stupid bullet, their serialization code is totally non-portable

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria and redskull :

    Computer 1 :

    sizeof(btQuantizedBvh) = 172
    sizeof(btQuantizedBvhNode) = 16
    getAlignmentSerializationPadding() = 0
    m_subtreeHeaderCount = 0x0
    sizeof(btBvhSubtreeInfo) = 32
    m_curNodeIndex = 1117396930
    m_bulletVersion = 0x44438e47
    sizeof(NodeArray) = 20
    [ERROR: expected -1501040340 bytes, got 3804880!
    [TriangleMesh](btQuantizedBvh::deSerializeInPlace]) WARNING, failed to load serialized BHV

    Computer 2 :

    sizeof(btQuantizedBvh) = 248
    sizeof(btQuantizedBvhNode) = 16
    getAlignmentSerializationPadding() = 0
    m_subtreeHeaderCount = 0x30c008e
    sizeof(btBvhSubtreeInfo) = 32
    m_curNodeIndex = 0
    m_bulletVersion = 0x429a1fc2
    sizeof(NodeArray) = 32
    [btQuantizedBvh::deSerializeInPlace] ERROR: expected 1635783352 bytes, got 3804880!

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria Alright, bullet's serialization is non-portable almost beyond recovery. Maybe instead we should look into generating and caching the file on each person's computer. Questions remain... 1) store the file in user preferences or in game data? latter would require admin rights 2) if a track is updated, how to detect the bvh needs to be regenerated? 3) do we generate only as tracks are loaded, or do we add a step on first launch to calculate them all? the latter might give a better impression to the player since STK will be faster after the initial loading phase is complete, but it's harder to code. The former is easier to code but the user might think that STK is slow because each time they load a new track it goes slowly.

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria In SVN a few improvements were committed, like disabling quantization and fixing the bvh being calculated twice

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria It's now cached so much less trouble.

One possible improvement we could do is to cache the bullet BVH

supertuxkart-sourceforge-migration commented 10 years ago

Author: hikerstk It's fast enough for 0.8, postponing till 0.8.1 (and closing duplicate ticket #735)

supertuxkart-sourceforge-migration commented 10 years ago

Author: auria It was fast enough for 0.8, so we can safely re-postpone to 0.8.2

supertuxkart-sourceforge-migration / stk-migration-test2

Loading overworld is SLOW #546