Closed iduartgomez closed 4 years ago
@rajasekarv this is ready to go. Tested all changes in disitrbuted mode as well as I had to do some refactoring around the scheduler. Will leave it open instead of merging in case you wanna check it.
There is one thing which is not yet well done which is the implementation of BoundDouble, it requires the inverse CFD of the Poisson distribution to find out the confidence range in one of the cases of the counter, and there is no pure Rust statistical library which implements it (and creating the numerical algorithm here would be beyond the scope of the issue/PR). I decided to merge it anyway as I was accumulating way too many changes, as the count is done and returned anyway and is not a breaking change and I may do a PR in the library I pulled (which looks like a good fit and candidate to use) in to implement it later.
This PR adds some extras, like a start of the joblistener (which will be useful in the future to add metrics etc), it wasn't strictly necessary but I got a bit carried away while implementing the stuff following the Spark codebase haha.
@iduartgomez Awesome. As there are quite a substantial amount of additions and changes, let me have a look at it.
WIP for partial jobs (and impl of count approximate)
close #93