numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs
https://numaflow.numaproj.io/
Apache License 2.0
1.07k stars 111 forks source link

update SDK build-push script to dynamically determine the example image path list #1833

Open KeranYang opened 1 month ago

KeranYang commented 1 month ago

Summary

Currently, the paths to the example images are hardcoded in the build-push workflow and passed to the script. This means when adding a new or updating an existing example, a developer needs to make changes at two places, the example folder and the workflow YAML file. We can't guarantee that we always remember to update both, e.g. https://github.com/numaproj/numaflow-python/pull/186

If we can update the script to dynamically construct the list, it will solve the problem. The script can search the example folder looking for Dockerfiles to determine the image path list. This can be done for Go, Python, and Rust. Java is a bit different but we can also investigate.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

KeranYang commented 1 month ago

@ayildirim21 wdyt?

kohlisid commented 1 month ago

As discussed with @KeranYang as well, we might want to add a --verify/--check flag which help to verify the sanity. And if required can even do everything apart from the final push to the repository to verify

ayildirim21 commented 1 month ago

The script (at least for Go SDK) has a function that can dynamically determine all example paths:

function traverse_examples () {
  find pkg -name "go.mod" | while read -r line;
  do
      dir="$(dirname "${line}")"
      echo "$dir"
      cd "$dir" || exit

      for command in "$@"
      do
        if ! $command; then
          echo "Error: failed $command in $dir" >&2
          exit 1
        fi
      done

      cd ~- || exit
  done
}

This will produce (if we were to print out dir):

pkg/reducestreamer/examples/sum
pkg/reducestreamer/examples/counter
pkg/sideinput/examples/sideinput_function
pkg/sideinput/examples/sink_sideinput
pkg/sideinput/examples/map_sideinput
pkg/sideinput/examples/map_sideinput/udf
pkg/sideinput/examples/simple_sideinput
pkg/sideinput/examples/simple_sideinput/udf
pkg/sideinput/examples/reduce_sideinput
pkg/sideinput/examples/reduce_sideinput/udf
pkg/sideinput/examples/simple_source_with_sideinput
pkg/sessionreducer/examples/sum
pkg/sessionreducer/examples/counter
pkg/mapper/examples/retry
pkg/mapper/examples/forward_message
pkg/mapper/examples/flatmap
pkg/mapper/examples/tickgen
pkg/mapper/examples/even_odd
pkg/sourcetransformer/examples/event_time_filter
pkg/sourcetransformer/examples/assign_event_time
pkg/reducer/examples/sum
pkg/reducer/examples/counter
pkg/sinker/examples/redis-sink
pkg/sinker/examples/fallback
pkg/sinker/examples/log
pkg/mapstreamer/examples/flatmap_stream
pkg/sourcer/examples/simple_source

So I think we can invoke a flag to run this method and then dynamically populate the dockerfile_paths variable. I will investigate this further. @KeranYang @kohlisid