Memory usage in the state is low cost-efficiency. Currently, we store Row in the state, which is an alias of Vec<Datum>.
Vec<Datum> is wasteful:
A Datum will cost 32 bytes, but an INT only costs 4 bytes.
We can use a bitmap to store the null so that one byte can represent 8 fields. What's more, non-nullable fields don't need any space.
For var-length types or nested types, there will be many allocations, which can be reduced to only one.
In fact, the rows in one state will always have the same format, we can significantly reduce the memory cost by introducing some schemaless memory format.
There are some requirements for the format:
Doesn't need to be decoded while using.
Can be referenced by field without copying the field's data.
Friendly for schema change (we can reserve a simple header version and leave the problem later).
I guess the value after the refactor will be large enough, so that indexmap may be helpful, but that needs a benchmark.
We can refer to FlatBuffer or something else in other databases, that needs investigation.
The value's encoding may be the same as #396, not sure.
Memory usage in the state is low cost-efficiency. Currently, we store
Row
in the state, which is an alias ofVec<Datum>
.Vec<Datum>
is wasteful:Datum
will cost 32 bytes, but anINT
only costs 4 bytes.In fact, the rows in one state will always have the same format, we can significantly reduce the memory cost by introducing some schemaless memory format.
There are some requirements for the format:
I guess the value after the refactor will be large enough, so that indexmap may be helpful, but that needs a benchmark.
We can refer to FlatBuffer or something else in other databases, that needs investigation.
The value's encoding may be the same as #396, not sure.