rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust
Apache License 2.0
2.23k stars 206 forks source link

Refactor local and distributed scheduler #43

Closed iduartgomez closed 4 years ago

iduartgomez commented 4 years ago

Refactored local and distributed scheduler to extract most common functionality and made it everything more maintanable; removed duplicity and code surface.

With this change it should be easier to progress and complete work on the DAGScheduler and Scheduler traits. I didn't change any of those as I am not sure what do you want to expose in the public API yet, but in any case should be fairly easy to change things around.

The refactor was made using a new non-leaking trait (NativeScheduler) which could be temporal until everything is moved to any of the other two traits.

Tests pass both in local and distributed mode.

rajasekarv commented 4 years ago

This looks pretty good. Instead of macro rules, we can probably include getter functions in the scheduler trait and let each scheduler implement these functions. It makes requirements clear if some external scheduler backends are to be implemented later on.

iduartgomez commented 4 years ago

This is already the case. The functions inside the macro are included in the NativeScheduler trait, the macro is just internal to library to avoid duplicating code btween the different implementing types.

Any functions which are implemented differently for the library schedulers in the future can be easily moved out of the macro too (like submit task is now). We can eventually remove the macro if everything is implemented differently inside the library in the future.

If it's ok then i will merge the PR as it is.

rajasekarv commented 4 years ago

I understand that the end result is the same. But having these requirements in traits make them clear.
But it's not a big deal since extending schedulers is not that trivial and not going to happen a lot like RDDs. So this is also perfectly fine.