klay-music / klay-beam

Our Apache Beam Transforms and Pipelines
1 stars 0 forks source link

Klay Beam 1.0 Refactor #33

Open CharlesHolbrow opened 1 year ago

CharlesHolbrow commented 1 year ago

Getting to Klay Beam v1.0

Right now, the main function of the repo is to demonstrate HOW to do things in Beam. Now that we know how Beam works, let's think about is to refactor the core functionality into a package, and develop reusable operational processes for creating and running data pipelines that depend on that package.

v1.0 should properly separate the package logic from the operational logic.

The Klay Beam Package

The core functionality that we tend to reuse in many jobs (Is this list missing anything?):

This behavior could be moved into a lightweight package. Additional job-speciffic Transforms should be in a dedicated package or launch script.

The Klay Beam Operational Processes

This examples in this repo show how to do a lot of complex jobs. In isolation the job configuration options below are simple enough, but we've found that different jobs tend to need very precise combination of them.

We'd like to be able to easily use these examples in this repo create, test, and run jobs with various combinations of these configuration options.

Message Passing Conventions

Moving this to a dedicated issue:

Job packages

As Max proposed in #18, let's cleanup Dockerfiles

Each job should have its own environment and and pin a docker image. If it helps, we can make a parent Dockerfile for large Dependencies and make job-specific Docker images 'FROM' the parent.

GPU Processing Support

Moving this to a dedicated issue:

Related Issues

v1.0 should resolve these issues

mxkrn commented 1 year ago

Generally this already is a great starting point, I'd like to make a few smaller suggestions that I think would be nice to work into this refactor.

CharlesHolbrow commented 1 year ago

👍 Adding these issues to the 1.0 Milestone. https://github.com/klay-music/klay-beam/milestone/1

CharlesHolbrow commented 10 months ago

This is close to ready. Some lingering loose ends: