Open sopel39 opened 2 years ago
cc @lukasz-stec
Hi, @sopel39.
I have some question about this issue. Please understand even if the question is stupid.
I am interested in the issue, but I want to understand the exact context, so I ask this question.
(DictionaryBlockEncoding) In the case of ORC or Parquet, the spec of the element constituting ids is Unsigned Integer. Will there be a problem if it is changed to short or byte?
This problem is unrelated to either ORC or Parquet.
DictionaryBlockEncoding) Even if it is changed to a short or byte type, wouldn't deserialization performance decrease because 2 byte padding must be inserted in the middle of the slice composed of short/byte elements during the deserialization process?
It's more about reducing the size of payload. Less payload, less processing along the way => win even if CPU usage stays the same
DictionaryBlockEncoding
:ids
asintegers
. We can useshort
orbyte
if dictionary has fewer positionsVariableWidthBlockEncoding
offsets
asintegers
. We can useshort
ifrawSlice
is short enough.