laysakura / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
5 stars 7 forks source link

[Task]: Pipeline construction helper macro #47

Open laysakura opened 1 year ago

laysakura commented 1 year ago

What needs to happen?

We are writing a pipeline like:

in main

#[tokio::main]
fn main() {
        DirectRunner::new()
            .run(|root| {
                root.apply(...)
            .await;
}

in test

https://github.com/laysakura/beam/blob/54b4d233b5c5a07c3f9d406b723d198f60d3c8ef/sdks/rust/tests/primitives_test.rs#L91-L103


As you may see, we have a bit of boilerplate codes.

We may want to write like:

#[apache_beam::main]
fn main(root: PValue) {
  root.apply(...);
}

The task should be like:

  1. First, conduct survey on other Beam SDKs and popular crates with such macro, and then write a design document (in google docs and add to the wiki page)
  2. Request a review from other contributors ( @laysakura and/or @sjvanrossum would be happy to review it).
  3. Develop it.

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

Kelvinyu1117 commented 1 year ago

Let me try to work on this task. One idea immediately comes to my mind is that we may have different PipelineOption for different Runner for the users in the future, the macro should be able to take the PipelineOption or the individual data members of PipelineOptionas parameters.

Something like this:

#[apache_beam::DirectRunner(DirectRunnerOptions)]
fn main(root: PValue) {
    root.apply(...);
}
laysakura commented 1 year ago

@Kelvinyu1117 Thanks! Since this feature would affect developer experiences so much, could you start from a design doc (that should be listed here) and request reviews in the discord?

Kelvinyu1117 commented 1 year ago

Sure, let me create a document and we can move the discussion on discord.