Closed sadikovi closed 6 years ago
@sunchao I might have made a mistake by having u64
instead of usize
. This PR updates it. Let me know if you think we should keep u64
. Thanks!
@sadikovi Yes I also thought about removing some of those unnecessary casts in the past. However, for this case, note that usize
is "how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes." So, I'm not sure if it is OK to use 32 bit for representing the encoded size on a 32-bit platform.
Thanks! I was thinking that it is highly unlikely that we will be writing more than u32 number of bytes for encoders, and it is rare to have file larger than 1-2 GB.
I am going to test row group writer and see how it all works together.
@sunchao I still think that we should update to usize
. Most of the code that I removed was a simple conversion from usize
to u64
, which I don't think is necessary - some of changes were removing casts to u64
.
Yes, you are right. We would not be able to keep more than usize
of data for encoders.
Merged. Thanks @sadikovi .
This is the follow-up of the size estimation feature in encoders.
It was an overkill to have
u64
return types in methods. I found that it is better to keep the types asusize
instead ofu64
, this minimises conversions in column writer and page writer.