Tracking issue: async integration

iduartgomez commented 4 years ago

After some talk we have decided to take a careful gradual approach to integrate async into the library.

Adding asynchronous computation is a large departure from the reference Spark implementation, and may change how we do certain things or what is possible (like certain optimizations that rely on stack allocation in our case) in ways that are not yet clear.

Therefore, is preferred to take a gradual approach as we explore the design space and evolve the library. The original work can be seen at #67, some work done in that preliminary PR will be ported to the main branch and more steeps will be taken to make testing and comparing both versions easily while we experiment.

Meanwhile an async branch will be maintained and kept in sync with the master branch.

Preliminary work

[x] Finalize shuffle fetcher asynchronous version (maintain compatibility with the sync caller for now; joining the handle output).
[x] Port the work to handle async runtime instantiation/handling to integrate with third parties.
[x] Port the work so tasks at the scheduler are ran asynchronously.
[x] Port the work so the worker executor is asynchrnous.
[x] Use capnp async readers (fix remaining issues within the worker executor).
[ ] Automate testing and profiling for distributed mode so we can check for regressions/gains.
[ ] Add a number of tests to cover for different kinds of workloads so both versions can be compared.

Future work

[ ] Explore if it's possible to chain iterators on the stack to take advantage of compiler optimizations for narrow dependency tasks. (Check compiler assembly output to check the compiler optimizations.)
[ ] Add micro-benchmarks to check for performance characteristics of the previous tasks.
[ ] Architecturize the execution of this kind of work in a way that both the sync and async versions can take advantage of possible optimizations.

iduartgomez commented 4 years ago

New async branch pushed to the repository from my fork.

iduartgomez commented 4 years ago

Changes done so far in:

dbd87cb
1b61aeb

Must improve shuffle_rdd / co_grouped_rdd shuffle fetch calls to avoid using a concurrent hashmap for the performance hit.

iduartgomez commented 4 years ago

With the last pushes (997bf62) now both the executor and the schedulers (in both modes) are async! Probably a bit of fine tuning about where tasks are spawned will be appropiate, right now each stream in the executor is moved (spawned) to(in) its own future and executed on the Tokio non-blocking TP;.

Ideally we probably will want to send some of the work inside to the blocking TP (spawn_blocking) leaving the main TP free to keep receiving/sending data while running the main task in the blocking TP. Roughly, right now is the equivalent of the previous implementation where a threadpool was used (and everything was blocking).

There is a problem in the task run test itself, where deserialization is not being done properly so it fails, however for the example jobs is running fine in distributed mode (the problem is the test not the executor itself) so it's marked as ignored for now.

I have looked a bit into the problems where we can't use the async bufreader capnp method (this is the last commit on async branch right now), it fails to read whatever data is being sent for some reason from the 'other side' (may it be the tests/examples, so from the scheduler; or the unit tests I created) so I haven't switched to that version yet; would be nice if w ecan use it but it may be a problem with the library itself (the connection is actually openned and the stream received, but then it fails to fetch any data from it at the executor).

iduartgomez commented 4 years ago

Next steeps are

[x] Fix some of the grievances with configuration (#54): this is important for fast iteration and testing of distributed mode (with the docker setup), right now it gets in the way when running test, for example.
[x] Focus on #26 and related issues, this is important too as the program is not shutting down properly on finalization on distribution mode (so will have to add Drop to the executor, and do all the graceful shutdown thingy).
[x] Try to fix whatever additional tests I have in place for the executor too and leave them on for CI and PR.
[x] Maybe (hopefully?) find what is wrong with the async read_message capnp thing. Meanwhile is not a big deal as we can keep using the current setup which ain't bad either.
[x] After all that is done (sans 4 maybe) then I can focus on the last two points of preliminary work so we can compare the async branch with the other, etc.

iduartgomez commented 4 years ago

Very much all that can be async right now is, except the compute parts of the Rdd's! All changes are in master. All the network stack is asynchonous and well optimized for spawning/parallelization (although profiling should be done in the future to see if there is a more optimal strategy to spawn tasks or avoid spawning altogether in certain parts of the program).

There is a caveat which makes us have to block on certain async calls due to problems with capnp builder types not being Send friendly (which makes them not usable across await points when running on the Tokio thread pool via spawn), capnproto/capnproto-rust#130 For now the solution is to either block on the executing thread or fall back to the std::net::TcpStream to communicate (unfortunately that negated many of the benefits of going async in the first place, so the preferred strategy is the first, at least that way the situation can be salvaged and awaiting around other places of the call stack still is possible).

This is temporary until a better solution can be found in the future (ideally most capnp_futures would impl Send).

Some of the problems with shuffling tasks persist (others where fixed), gonna focus again trying to fix whatever still is broken there for now.

EDIT: actually looks like everything is working just fine now in distributed mode with the latests commits so no issues!

rajasekarv / vega

Tracking issue: async integration #71

Preliminary work

Future work