rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust
Apache License 2.0
2.23k stars 206 forks source link

Tracking issue: async integration #71

Open iduartgomez opened 4 years ago

iduartgomez commented 4 years ago

After some talk we have decided to take a careful gradual approach to integrate async into the library.

Adding asynchronous computation is a large departure from the reference Spark implementation, and may change how we do certain things or what is possible (like certain optimizations that rely on stack allocation in our case) in ways that are not yet clear.

Therefore, is preferred to take a gradual approach as we explore the design space and evolve the library. The original work can be seen at #67, some work done in that preliminary PR will be ported to the main branch and more steeps will be taken to make testing and comparing both versions easily while we experiment.

Meanwhile an async branch will be maintained and kept in sync with the master branch.

Preliminary work

Future work

iduartgomez commented 4 years ago

New async branch pushed to the repository from my fork.

iduartgomez commented 4 years ago

Changes done so far in:

Must improve shuffle_rdd / co_grouped_rdd shuffle fetch calls to avoid using a concurrent hashmap for the performance hit.

iduartgomez commented 4 years ago

With the last pushes (997bf62) now both the executor and the schedulers (in both modes) are async! Probably a bit of fine tuning about where tasks are spawned will be appropiate, right now each stream in the executor is moved (spawned) to(in) its own future and executed on the Tokio non-blocking TP;.

Ideally we probably will want to send some of the work inside to the blocking TP (spawn_blocking) leaving the main TP free to keep receiving/sending data while running the main task in the blocking TP. Roughly, right now is the equivalent of the previous implementation where a threadpool was used (and everything was blocking).

There is a problem in the task run test itself, where deserialization is not being done properly so it fails, however for the example jobs is running fine in distributed mode (the problem is the test not the executor itself) so it's marked as ignored for now.

I have looked a bit into the problems where we can't use the async bufreader capnp method (this is the last commit on async branch right now), it fails to read whatever data is being sent for some reason from the 'other side' (may it be the tests/examples, so from the scheduler; or the unit tests I created) so I haven't switched to that version yet; would be nice if w ecan use it but it may be a problem with the library itself (the connection is actually openned and the stream received, but then it fails to fetch any data from it at the executor).

iduartgomez commented 4 years ago

Next steeps are

iduartgomez commented 4 years ago

Very much all that can be async right now is, except the compute parts of the Rdd's! All changes are in master. All the network stack is asynchonous and well optimized for spawning/parallelization (although profiling should be done in the future to see if there is a more optimal strategy to spawn tasks or avoid spawning altogether in certain parts of the program).

There is a caveat which makes us have to block on certain async calls due to problems with capnp builder types not being Send friendly (which makes them not usable across await points when running on the Tokio thread pool via spawn), capnproto/capnproto-rust#130 For now the solution is to either block on the executing thread or fall back to the std::net::TcpStream to communicate (unfortunately that negated many of the benefits of going async in the first place, so the preferred strategy is the first, at least that way the situation can be salvaged and awaiting around other places of the call stack still is possible).

This is temporary until a better solution can be found in the future (ideally most capnp_futures would impl Send).

Some of the problems with shuffling tasks persist (others where fixed), gonna focus again trying to fix whatever still is broken there for now.

EDIT: actually looks like everything is working just fine now in distributed mode with the latests commits so no issues!