sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Update return types in encodings.rs #126

Closed sadikovi closed 6 years ago

sadikovi commented 6 years ago

This is the follow-up of the size estimation feature in encoders.

It was an overkill to have u64 return types in methods. I found that it is better to keep the types as usize instead of u64, this minimises conversions in column writer and page writer.

sadikovi commented 6 years ago

@sunchao I might have made a mistake by having u64 instead of usize. This PR updates it. Let me know if you think we should keep u64. Thanks!

coveralls commented 6 years ago

Pull Request Test Coverage Report for Build 534


Totals Coverage Status
Change from base Build 533: 0.0%
Covered Lines: 10474
Relevant Lines: 10997

💛 - Coveralls
sunchao commented 6 years ago

@sadikovi Yes I also thought about removing some of those unnecessary casts in the past. However, for this case, note that usize is "how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes." So, I'm not sure if it is OK to use 32 bit for representing the encoded size on a 32-bit platform.

sadikovi commented 6 years ago

Thanks! I was thinking that it is highly unlikely that we will be writing more than u32 number of bytes for encoders, and it is rare to have file larger than 1-2 GB.

sadikovi commented 6 years ago

I am going to test row group writer and see how it all works together.

sadikovi commented 6 years ago

@sunchao I still think that we should update to usize. Most of the code that I removed was a simple conversion from usize to u64, which I don't think is necessary - some of changes were removing casts to u64.

sadikovi commented 6 years ago

Yes, you are right. We would not be able to keep more than usize of data for encoders.

sunchao commented 6 years ago

Merged. Thanks @sadikovi .