rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust
Apache License 2.0
2.23k stars 206 forks source link

Sort by #72

Open return02 opened 4 years ago

return02 commented 4 years ago

Implement sort by transform by a very simple range_partitioner.

This algorithm is almost the same with Apache Spark: partition all the data into ordered partitions and sort them separately.

There're still some work to be done:


I have rebased master in my branch and I still don't know what changed in Cargo.lock(because it's in .gitignore).

iduartgomez commented 4 years ago

It will compile with whatever last versions based on the Cargo.toml manifest, which is fine as a library. As a tip, in case you didn't know, if you don't want to, is not necessary to close the PR, next time you can do the following:

git checkout <working branch>
git rebase native_spark/master # add this repo as upstream  
git push origin <working branch> -f

Do you want this work to be reviewed and merged or will be doing the other working items before merge?

return02 commented 4 years ago

It will compile with whatever last versions based on the Cargo.toml manifest, which is fine as a library. As a tip, in case you didn't know, if you don't want to, is not necessary to close the PR, next time you can do the following:

git checkout <working branch>
git rebase native_spark/master # add this repo as upstream  
git push origin <working branch> -f

Do you want this work to be reviewed and merged or will be doing the other working items before merge?

I'm not familiar with git push -f that I had to close the old PR . Next time I'll do what you mentioned.

Please do not merge this PR before finishing other work item.

Actually, I have no good idea about good algorithm for building range_bounds. I'd like to discuss this question so that I created this PR although it's not complete.

iduartgomez commented 4 years ago

Sorry I haven't been able to look too much into this as I was fixing other stuff, will check out tomorrow hopefully.

But as I understand, it works right now but you are not happy about how efficient it is, ¿correct?