Open nfrechette opened 2 years ago
It may be necessary to support multiple layers and not just two. Some tracks are special and cannot be interpolated which means boundary key frames must be retained. This might force more memory to be retained than is otherwise necessary. Finer grain control should be possible and left for the host runtime to determine, if possible and efficient to support.
For every joint, we can calculate the total geodesic distance of things it can move (sum of all children distances + virtual vertices). We can then sort joints by this value to figure out which joints are the most important. Could offer a bias value for animators to control (e.g. joints used by gameplay for something, IK weights, etc)
The database streaming feature treats every joint the same which isn't ideal. In practice, some joints are more important than others (e.g. spine VS eyebrow). It should be possible to take cosmetic joints and move them into a separate clip in order to stream it in/out on demand as well as hook it up to the existing streaming database feature. Here is how it would work at a high level.
To do this, we will split a compressed clip into a separate layer by leveraging the bind pose stripping feature. The base clip will contain all core tracks (e.g. root, spine, arms, legs, anything important) that must always be loaded. Cosmetic joints will have their tracks effectively all set to the bind pose to ensure they are stripped as default values. This will remove them from the base clip. The cosmetic clip will do the opposite: all important tracks in the base clip will be set to the bind pose in order to strip them from this clip.
This will result in two clips with mutually exclusive data to reconstruct the original raw data. To do so, first the output pose buffer must be pre-filled with the bind pose since we use bind pose stripping. Next, the base clip is decompressed to populate those tracks, making sure to skip default sub-tracks. Last but not least, the cosmetic clip is decompressed to populate the remaining tracks, once again skipping default sub-tracks to avoid stomping the base/bind data.
In practice, we don't need to compress twice. After the normal compression is done and final quantization has been performed, we can split the clip into its two parts. This will be cheap to perform. Both clips can be written as output. Note that this means that this will have a small memory overhead when both clips are fully loaded into memory: having a second clip means a second clip header and 2 extra bits per sub-track for their type. An acceptable tradeoff if some data is meant to be stripped or not streamed through the database.
Once both clips have been compressed (base + cosmetic), the cosmetic clip can be entirely moved into the database and streamed on demand since it is entirely optional. Only the base clip is absolutely required for proper playback. This means that while base clips have 2 streamable tiers (out of 3), cosmetic clips have all 3 tiers streamable.
Since both clips are independent, they can each have their own database settings. This means that cosmetic joints can have their data stripped more aggressively without impacting the overall visual fidelity as much.
In order to reconstruct our pose, we will thus need to decompress 2 clips with the cosmetic clip being entirely optional. This means the decompression cost will be higher but not twice as much. In practice, the first clip will decompress fewer tracks compared to what it would have done otherwise since it will skip the cosmetic joints. Similarly, the second clip will only decompress the missing cosmetic joints. This means that prefetching won't perform as well since more non-contiguous memory will be touched. This cost should be entirely reasonable.
Reconstructing the compressed clip before it was split into two should be possible but if we do so, it means that we cannot stream the data out afterwards and it would mean re-allocating the clip memory. A lot of gymnastics.
To determine whether a track is cosmetic or not, no effort will be made to do this automatically. We will rely on the game engine integration to tell us that information by exposing an enum (base, cosmetic) in the track description during compression. Down the road, a third option (auto) could be introduced that will this determination automatic.
There is one caveat to this implementation. While this works great and elegantly when we need to decompress the whole pose or individual joints in the base clip, decompressing individual joints from the cosmetic clip is more tricky. We need to be able to tell if a joint is cosmetic or not during decompression. If the joint is cosmetic but the cosmetic clip hasn't been streamed in, it means we cannot return anything but the bind pose (if it is provided to us during decompression). We will have to allow the game engine to tell us this information during decompression with something like
get_cosmetic_level(uint track_index)
. This will be further complicated if we wish to introduce a procedural algorithm for figuring out which joints are cosmetic.Down the road if sub-sampling is implemented, it also means that each clip could have a different sample rate: base @ 30 FPS and cosmetic @ 10 FPS. This would be fully configurable.
In theory, we could support any number of layers here, not just two (base + cosmetic). However, in practice we most likely rarely need more than two. If the layers end up too small, the clip header overhead will start to be significant and managing the complexity to stream all of this in/out will become a burden.