use std map instead of rapidjson document to save memory

kanbouchou commented 2 months ago

Summary

In our application, we store many IkFailureInfo, and rapidjson::Document _rCustomData takes too much memory in the order of tens of GB. rapidjson::Document has large memory overhead.

Alternatives

Each IkFailureInfo has rapidjson::Document (current way). Easy to code, flexible but takes too much memory.
Each IkFailureInfo has rapidjson::Value, and have a single rapidjson::Document for a vector of them. As flexible as rapidjson::Document, and almost as easy to code. Uses less memory, about one eighth of current approach.
Replace rapidjson::Document with map<string, vector<dReal>>. Easy to code, memory usage is very small. Drawback is decreased flexibility compared to json representation and inability to pre-allocate memory. This is the proposed approach.
- To mitigate the memory preallocation issue, we could make the map value as a pair of boolean representing validity and vector holding real data, and in IkFailureInfo::Reset, instead of map::clear(), invalidate each entry by setting the validity flag to keep memory allocated. If key keeps changing, map size can grow over time.
- Replace rapidjson::Document with vector<pair<string, vector<dReal>>>. Memory usage is small and preallocation is possible. A bit harder to use, less flexible compared to json representation.

Experiment

Measure memory usage to store key-value pair (key is "parameterkey" and value is empty vector).

	Memory usage per key-value pair (Kbytes)	Speed per key-value pair insertion (nsec)
rapidjson::Document	4.2	1104
rapidjson::Value	0.52	130
std::map<string, vector>	0.15	45
std::vector<pair<string, vector>	0.09	35

key_value_space_usage_test.zip

key_value_space_usage_speed_test.zip

rdiankov commented 2 months ago

Can you test the speed here? My one issue with std::map is that every element is its own allocation on the heap...

To tell you the truth, I'd rather this be a std::vector, that we carefully control how it is allocated/deallocated. And instead of strings for the names, we can use enums.

kanbouchou commented 2 months ago

Can you test the speed here? My one issue with std::map is that every element is its own allocation on the heap...

I've updated the description, please check. rapidjson::Document is more than a 1us, std::map is 45ns, std::vector is 35ns.

To tell you the truth, I'd rather this be a std::vector, that we carefully control how it is allocated/deallocated. And instead of strings for the names, we can use enums.

Yes, we can do it with vector of enum and actual data pair (vector<pair<int, vector<dReal>>). However, we cannot reserve the memory of inner vector easily. Also, when inserting an element, we need to search for existing element with same enum value.

Another option is std::map whose value is a pair of boolean and actual data. In IkFailureInfo::Reset, we skip calling std::map::clear but set boolean value to false to invalidate the data structure to avoid memory deallocation. Concern with this approach is map size can theoretically increase over time without bound, but if keys are a small finite set, it's ok in practice.

What do you think?

rdiankov commented 2 months ago

To tell you the truth, I think having a custom data struct for key/value store is not so great. It creates a lot of ambiguity and is not very memory efficient to represent everything with std::vector<dReal>. How about we completely get rid of the custom data struct and instead have different members for the different fields? On Mujin side, we can derive from IkFailureInfo and have our own memory-efficient structures

kanbouchou commented 2 months ago

To tell you the truth, I think having a custom data struct for key/value store is not so great. It creates a lot of ambiguity and is not very memory efficient to represent everything with std::vector<dReal>. How about we completely get rid of the custom data struct and instead have different members for the different fields? On Mujin side, we can derive from IkFailureInfo and have our own memory-efficient structures

Yes, we could do it that way. So we would have a derived class of IkFailureInfo which can do more efficient internal representation of what Mujin needs. In that case,

I would add following api's for accessing custom values to IkFailureInfo.

virtual void IkFailureInfo::SetCustomRealValue(const char* key, dReal value) = 0;
virtual void IkFailureInfo::SetCustomRealValues(const char* key, const vector<dReal>& value) = 0;
virtual dReal IkFailureInfo::GetCustomRealValue(const char* key) = 0;
virtual const vector<dReal>& IkFailureInfo::GetCustomRealValues(const char* key) = 0;

I would simply remove rapidjson::Document _rCustomData member from IkFailureInfo and let each user handle it in their specific ways if needed.

Does it sound good?

rdiankov commented 2 months ago

If openrave doesn't use custom data anymore, and we have derived classes that will have their own data storage and functions, then what is the value of having SetCustomRealValue/GetCustomRealValue in the base class?

kanbouchou commented 2 months ago

If openrave doesn't use custom data anymore, and we have derived classes that will have their own data storage and functions, then what is the value of having SetCustomRealValue/GetCustomRealValue in the base class?

IkFailureAccumulatorBase::GetNextAvailableIkFailureInfo() returns a reference to IkFailureInfo, and to access custom members of our derived class, we either need to downcast to DerivedIkFailureInfo and access its members directly, or access them via virtual functions. In terms of speed, static_cast is sligherly faster than virtual accessor functions (dynamic_cast is much slower). Do you want to go with downcasting by static_cast rather than having accessor apis?

0 base:total=23091.8 usec, average=2.30918 nsec(10000000 samples)
00
1 virtual:total=28271.2 usec, average=2.82712 nsec(10000000 samples)
BB
2 static_cast:total=23334.7 usec, average=2.33347 nsec(10000000 samples)
BB
3 dynamic_cast:total=67286.9 usec, average=6.72869 nsec(10000000 samples)
BB

cast_virtual_speed_test.zip

rdiankov commented 2 months ago

Since we control "IkFailureAccumulatorBasic" in dataaccumulator.h, static casting should be safe.

kanbouchou commented 2 months ago

Since we control "IkFailureAccumulatorBasic" in dataaccumulator.h, static casting should be safe.

Yes, in that context it's fine to use static cast.

But I'm afraid things may not work when we use ikfast solver in openrave who doesn't know the MujinIkFailureInfo so it would require it's own derived class, say OpenRAVEIkFailureInfo to store custom values. When we serialize IkFailureInfo instance to binary format, the serializer needs to know which of the derived classes the type of the instance is. Moreover, when we deserialize the binary data, we don't know how to parse the data without knowing the derived class type.

I'd like to propose to add map<string, pair<bool, vector<dReal>>> _mapCustomData; in base IkFailureInfo class again. Then, IkFailureInfo::Reset would just set the boolean of the value pair to false without calling map::clear. Then, we should not need to have any derived classes and thus the data I/O can remain clean. Memory reallocation problem you were afraid of is avoided by skipping map::clear, and I think slight computational time increase of map key search when adding a new custom key-value pair should be small because of log(N) complexity and expectedly small number (much less than 100) of keys. If you like, key type of _mapCustomData can be integer (enum value) as well.

rdiankov commented 2 months ago

We need to figure out how to make openrave as fast as possible. This code is on a critical path for a lot of things; and the usage of maps is actually one thing I been wanting to get rid of for quite sometime. Being general is always at odds with being optimal. From the beginning, openrave has tried to be general to a point where a lot of things became too slow... I think the question we should be asking ourselves is: what is the design to make this most optimal? Let's not worry too much about generality 😉

kanbouchou commented 2 months ago

We need to figure out how to make openrave as fast as possible. This code is on a critical path for a lot of things

Yes, reducing overhead is very important especially for this code.

and the usage of maps is actually one thing I been wanting to get rid of for quite sometime. Being general is always at odds with being optimal. From the beginning, openrave has tried to be general to a point where a lot of things became too slow... I think the question we should be asking ourselves is: what is the design to make this most optimal? Let's not worry too much about generality 😉

I agree that we have to avoid slowness because we aim for too much generality. However, I think we still need a support for openrave iksolver, many projects still rely on it and I think we should continue to support it. If you don't like map, what about following options?

virtual accessors (https://github.com/rdiankov/openrave/pull/1379#issuecomment-2088257517), overhead is like 0.5ns.
vector<pair<int, vector<dReal>>> to represent custom values in base class. Each element is key-value pair where key is represented by an integer. To invalidate, we never clear this outer vector, and set the integer -1 or 0 to preserve memory.

rdiankov commented 2 months ago

Not sure what you mean by virtual accessors, can you give an example?

And using vector<pair<int, vector<dReal>>> without clearing the vectors sounds like a good option that will allow us to transition without changing too much code.

Thanks

kanbouchou commented 2 months ago

Not sure what you mean by virtual accessors, can you give an example?

I meant to define a set of api's in IkFailureInfo and leave underlying data structure and implementation to the derived class.

virtual void IkFailureInfo::SetCustomRealValue(const char* key, dReal value) = 0;
virtual void IkFailureInfo::SetCustomRealValues(const char* key, const vector<dReal>& value) = 0;
virtual dReal IkFailureInfo::GetCustomRealValue(const char* key) = 0;
virtual const vector<dReal>& IkFailureInfo::GetCustomRealValues(const char* key) = 0;

And using vector<pair<int, vector<dReal>>> without clearing the vectors sounds like a good option that will allow us to transition without changing too much code.

I'm also fine with this approach as well. Each element can be a struct instead of pair to be clearer.

///\brief holds custom data key-value pair. To be used to avoid reallocating memory.
template<typename DataType> struct KeyValueData
{
  void Reset() { customDataIndex = 0; }
  DataType data; ///<   if customFieldIndex is 0, invalid
  uint64_t customDataIndex; ///< represents semantic meaning of data. If 0, data is invalid.
};

What about going forward with adding vector<KeyValueData<vector<dReal>>> _vCustomData in IkFailureInfo?

rdiankov commented 2 months ago

I wouldn't call it DataType since it makes me think of int, float, etc. Perhaps DataNameId would be better. Are all our current custom data fields Reals? Do we use integers, and more specifically, do we store large amount of data in this custom field? (I hope not).

kanbouchou commented 2 months ago

Do we use integers, and more specifically, do we store large amount of data in this custom field? (I hope not).

We have less than 10 custom data fields currently. So far, we are not using integer. Currently used types are vector of float as well as float scalar.

cielavenir commented 2 months ago

solutionIndices is float - actually since I entered Mujin I wonder why.

kanbouchou commented 2 months ago

solutionIndices is float - actually since I entered Mujin I wonder why.

@cielavenir solutionIndices is part of IkReturn's custom map, so different from the custom data in IkFailureInfo

rdiankov commented 2 months ago

Hmm... might as well get rid of IkReturn's map while we are at it, but can do that at a later time. Sure, for now, we can go forward with vector<KeyValueData<vector<dReal>>>

rdiankov commented 2 months ago

woot~

rdiankov / openrave