sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Discussion around Box<?> and dynamic dispatching #139

Closed sadikovi closed 6 years ago

sadikovi commented 6 years ago

I would like to start discussion around having Box<?> in the trait. Several PRs have been opened to address potential performance regressions because of that. There are couple of other problems with Box<?>, one of them is inference of 'static lifetime by default, unless we specify lifetime explicitly.

My proposal is slowly migrating to usage of enums and generics instead, the reason being that all of our implementations are part of the library.

It is worth investigating if the following approach will work:

sadikovi commented 6 years ago

@sunchao I know it might be a bit early to jump onto this work, but I would like to come up with some sort of consensus of what we are going to do moving forward. Would like to know your opinion, thanks!

sunchao commented 6 years ago

Thanks for bring up this proposal @sadikovi . I'd suggest we first find out whether this is really causing perf issue from benchmarks. If so, I'd say yes, we should remove the dynamic dispatching cost (actually most of the costs come from inlining I think).

I can create a micro-benchmark to test the value reader performance. It is good to add anyway.

sadikovi commented 6 years ago

Thanks. We should definitely profile it. However, I am convinced that removing Box could improve the usability and fix some other issues around lifetimes. Let me know what you think.

sunchao commented 6 years ago

Do you mean Box using 'static lifetime in default? To my impression this hasn't caused much of an issue and the default 'static lifetime should just work mostly. Can you give some example where it makes it hard to use?

IMO enum also has its disadvantages too. For instance, you'll need to do a pattern matching on it before you can call the specific method.

sadikovi commented 6 years ago

I just had some issues when writing tests for column writer, but I managed to work around those issues, at least for now. I agree, enums can be quite annoying to work with. I remember you mentioning the problem about DataType having 'static, I thought we could solve those issues by using enums in encoders and decoders.

I am happy to close this issue, since we need to add write support first. After that I could prototype something for encoders/decoders/compression and see if it makes life easier.

sunchao commented 6 years ago

The DataType one is largely solved by moving the 'static annotation to the definition site. Yes, I'd agree that we finish the write impl first. Meanwhile I'll do some benchmark to see whether the change really helps from the performance perspective.

sadikovi commented 6 years ago

I am closing this issue, since we figured it may not be worth the change.