substantic / rain

Framework for large distributed pipelines
https://substantic.github.io/rain/docs/
MIT License
747 stars 54 forks source link

Roadmap after v0.2 #26

Closed spirali closed 6 years ago

spirali commented 6 years ago

We have received many feedbacks from our reddit post (https://www.reddit.com/r/rust/comments/89yppv/rain_rust_based_computational_framework/). I think that now is time to recap our plans and maybe reconsider priorities to reflect real needs of users. This is meant as a kind of brainstorming white board; each individal task should have own issue at the end. Also, I would like to focus on a relatively short term road plan (let us say things that could be finished within 3 months), maybe we can create a similar post for our long term plans.

EDIT (gavento): Moved @spirali's TODO to a post below.

gavento commented 6 years ago

Based on the feedback from the mentioned Reddit discussion, our long-term goals and internal discussion, this is the list of issues to work on, their [priority], (asignee) and their sub-tasks.

Prioritized enhancements

Custom tasks (subworkers) in more languages

Requested by several people in the discussion, seems like a good idea anyway. For now with Capnp.

Easier deployment in the cloud

Packaging for easier deployment

Multiple options, priorities may vary. (@spirali)

Fix current bugs

Improve Python API

Pythonize the client API.

Improve testing infrastructure

Client-side protocols

Replace capnp RPC and the current monitoring dashboard HTTP API with common protocol. Part of #11 (more discussion there) but specific to the public API.

Improve the dashboard with more information and post-mortem analysis

More real-world code examples

Lower priority, best based on real use-cases. Ideas: numpy subtasks, C++/Rust subworkers

Enhancements to revisit in the (not so distant) future

gavento commented 6 years ago

@spirali's original TODO notes

First, I start with my todo list as looked like before the reddit post:

The list of items that was actually in our long term goals, but we should reconsider its priority.

spirali commented 6 years ago

Is "Python subworker as a library" necessary? I have the feeling that for each environment where we can transfer a function to a subworker from a client in reasonable (and portable) way then we should do it that way. The overhead of transferring a function is minimal (it is done only once) and flexibility is huge. I consider building a "fixed" subworker as a kind of side-step where there is no such option (C++/Rust [?*])

gavento commented 6 years ago

I can imagine some scenarios where a python worker could be useful:

Also, the built-in pytask subworker can be trivially implemented as one such subworker task (with a bit of unpacking logic) and so it is not much more work.

spirali commented 6 years ago

However, I see now that it can be useful to define tasks that may be called e.g. from Java client where cloudpickle is not easily accessible.

spirali commented 6 years ago

PR #40 implements replacement of DataStore API with direct calls

vojtechcima commented 6 years ago

PR #52 implements Exoscale deployment scripts.

yingfeng commented 6 years ago

These are what I think useful in future:

gavento commented 6 years ago

Transitioned to #64 after v0.3 release