rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust
Apache License 2.0
2.23k stars 206 forks source link

Improve error handling and trazability on executor and user code crashes #53

Open iduartgomez opened 4 years ago

iduartgomez commented 4 years ago

This issue can be mentored for anyone who may want to help.

While this has improved we still have a whole lot of unwrapping around; since we still are in a very early phase is not necessary to go crazy on this since many things will change several times (probably). But, that said, and while definitively in some places panics and abortion should happen if anything goes wrong, for the sake of traceability better error handling is good to have even if the whole application ends crashing.

In particular we should inventory where exactly are the boundaries between executor ran code and driver ran code, and crashes in user code and executors should be handled, reported to and gracefully handled by the driver, which then should take a clear plan of action depending on the error (e.g. if the executor detects a problem in one of its threads while running code action should be one, if it dies due to some other reason other, etc.).

First task should be to inventory all the call places where is necessary to take action (only a fraction of all the unwraps really) and then extend/modify methods to return proper Return types which then can be used to shut down, signal drivers, clean up, etc.

Related to #26 and #25 (shall end fixing up that issue).