griefly / griefly

Griefly: Yet Another Space Station Remake
https://grief.ly
MIT License
130 stars 25 forks source link

Memory optimization of cast table #428

Open kremius opened 7 years ago

kremius commented 7 years ago

The cast table is a bit array of size N^2 where N is the amount of game classes. It works fine now but it is easy to notice that with increase of the classes' amount cast table will take too much place (10 megabyte for 10k classes). It may be reasonable to transform cast table into cast table with lists. Each entry in the cast table will hold full list of base classes for some game class (basically, it will be transformation from adjacency matrix to adjacency list). Usually a class hierarchy is not deep, so there will be good saving of memory with reasonable performance.

Other way is to abuse the fact that branching factor should not be very high for class hierarchy, so it is possible to encode all information about inheritance for a game class in relatively small string.

"aabc" // successor
"aab" // base
"ab"
"cf"

For nodes with a high branching factor a special symbol should be implemented - f or something. It will mean that the next symbol applies to the previous level of the hierarchy.

"aa"
"aafb"
"aac"
"aafbc"

Very memory favorable and it is possible to implement quite fast IsPrefix function. Two uint64 variables should cover 95% of use cases (depth is 16 with 8 bit symbols, or depth is 32 with 4 bit symbols).

bool IsBase(uint64 base, uint64 type)
{
    int first_zero_bits = zero_bits(base) & ~0xF;
    base <<= first_zero_bits;
    type <<= first_zero_bits;
    return base == type;
}

That will be a little bit harder for multiple uint64 for type, but still probably extremely fast. However, tests are needed.

But now there are only like 100 classes, and even for 1000 classes the cast table will occupy only like 100 kb, so the issue is totally not a priority.

kremius commented 6 years ago

Also it is possible to keep 32 byte (int64 type[4]) per object to avoid calling virtual function. The beginning of the object should be loaded into cache anyway (because of the vptr), cache line is usually 64 bytes, so the type will be always loaded -> this should ensure an extremely quick search for an object of requested type. However, such speed is totally not needed, at least right now.

Also it is possible to not use predefined amount of bits per level, but just always set up the bit at the end of a type. ([type][1][0...0]). That will allow to not waste bits per levels when branching factor is not very strong without any drawbacks in cases with strong branching factor. It may be even possible to always use one 64 bit number.

kremius commented 6 years ago
bool IsBase(uint64 base, uint64 type)
{
    int base_type_length = zero_bits(base) + 1;
    base <<= base_type_length;
    type <<= base_type_length;
    return base == type;
}
kremius commented 6 years ago

http://doc.qt.io/qt-5/hierarchy.html

But some other big hierarchies are needed for testing. Maybe some way to parse any SS13 build source and get classes graph?