hydro-project / hydroflow

Hydro's low-level dataflow runtime
https://hydro.run/docs/hydroflow/
Apache License 2.0
469 stars 34 forks source link

Hydroflow 1.0 Roadmap #1074

Closed MingweiSamuel closed 3 weeks ago

MingweiSamuel commented 6 months ago

P0

  1. Singleton syntax/usability

P1

  1. State externalization (dataflow) (#1059)
  2. Hydroflow+ polish (documentation)
    • Prune Hydroflow operator set
  3. Standard library for Hydroflow+
    • distributed async/await (futures/promises/etc.)
    • actors
    • distributed protocols, CRDTs/[semi]rings/groups, KVS, transaction manager, BFT, etc
  4. Deployment (hydro-deploy)
  5. Performant KVS
  6. Debugging/diagnostics/telemetry
  7. Decent Ops support
    • cleaner k8s integration?
    • Version management
  8. Lattices/properties (semantics)
    • Singletons, flows,
    • ticks/deltas
    • David Chu optimization preconditions
    • Groups, rings?
  9. Dynamic/Auto-Ops
    • Live reconfiguration
    • Auto-elasticity

P2

  1. Choose benchmarks (maybe a subset of the below)
  2. Networking?
    • Backpressure?
    • Reconnection?
    • Shared memory performance
      • No serialization overhead
  3. Integrations

P3

  1. Fault tolerance specs (on Hydroflow functions?)
  2. State externalization (for replication)
    1. checkpointing #1049
  3. Sequence operators (windowing)
    • (check out stream-it)
    • Caleb Stanford ordered streams
  4. Extended Performance
    • Cache locality
    • Vectorization / Columnar join
    • Instruction-level benchmarking (Vtune, etc)
  5. Dataflow Algebra Optimizations
MingweiSamuel commented 5 months ago

Moved here from #930

Performance

Expressivity