Sort by - Githubissues

rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust

Apache License 2.0

2.23k stars 206 forks source link

Sort by #72

Open return02 opened 4 years ago

return02 commented 4 years ago

Implement sort by transform by a very simple range_partitioner.

This algorithm is almost the same with Apache Spark: partition all the data into ordered partitions and sort them separately.

There're still some work to be done:

[ ] Find a better algorithm for building range_bounds.
[ ] implement descending.
[ ] use binary search in method get_partitions().
[x] perhaps F: SerFunc(&Self::Item) -> K + Clone is better than F: SerFunc(Self::Item) -> K.
[ ] test corner case.

I have rebased master in my branch and I still don't know what changed in Cargo.lock(because it's in .gitignore).

iduartgomez commented 4 years ago

It will compile with whatever last versions based on the Cargo.toml manifest, which is fine as a library. As a tip, in case you didn't know, if you don't want to, is not necessary to close the PR, next time you can do the following:

git checkout <working branch>
git rebase native_spark/master # add this repo as upstream  
git push origin <working branch> -f

Do you want this work to be reviewed and merged or will be doing the other working items before merge?

return02 commented 4 years ago

It will compile with whatever last versions based on the Cargo.toml manifest, which is fine as a library. As a tip, in case you didn't know, if you don't want to, is not necessary to close the PR, next time you can do the following:
git checkout <working branch>
git rebase native_spark/master # add this repo as upstream  
git push origin <working branch> -f
Do you want this work to be reviewed and merged or will be doing the other working items before merge?

I'm not familiar with git push -f that I had to close the old PR . Next time I'll do what you mentioned.

Please do not merge this PR before finishing other work item.

Actually, I have no good idea about good algorithm for building range_bounds. I'd like to discuss this question so that I created this PR although it's not complete.

iduartgomez commented 4 years ago

Sorry I haven't been able to look too much into this as I was fixing other stuff, will check out tomorrow hopefully.

But as I understand, it works right now but you are not happy about how efficient it is, ¿correct?