klay-music / klay-beam

Our Apache Beam Transforms and Pipelines
1 stars 0 forks source link

Document GCP Dataflow Setup and Permissions #53

Closed CharlesHolbrow closed 11 months ago

CharlesHolbrow commented 1 year ago

Right now, GCP permissions/service accounts/setup is not documented.

We want klay_beam to be documented such that it is agnostic from a specific GCP project (i.e. klay-training and klay-beam-tests)

Job packages should not be open sourced. As a result, it is okay for job package READMEs to include things like the example invocation below, which has our project klay_beam_test and our service account dataset-dataflow-worker@klay-beam-tests.iam.gserviceaccount.com

# This kind of documentation is OK in job packages. It is not OK in the klay_beam package. 
python bin/run_job_extract_chroma.py \
    --project klay-beam-tests \
    --service_account_email dataset-dataflow-worker@klay-beam-tests.iam.gserviceaccount.com \
    --machine_type n1-standard-8 \
    --region us-central1 \
    --max_num_workers 50 \
    --autoscaling_algorithm THROUGHPUT_BASED \
    --runner DataflowRunner \
    --experiments use_runner_v2 \
    --sdk_location container \
    --setup_file ./setup.py \
    --temp_location gs://klay-dataflow-test-000/tmp/extract_chroma/ \
    --source_audio_path 'gs://klay-dataflow-test-000/glucose-karaoke/' \
    --job_name 'extract-chroma-test-000'