klay-music / klay-beam

Our Apache Beam Transforms and Pipelines
0 stars 0 forks source link

Errors in Dataflow when extracting DAC tokens #47

Closed CharlesHolbrow closed 9 months ago

CharlesHolbrow commented 10 months ago

The following invocation leads a surprising number and variety of errors visible in the dataflow console) (See "Diagnostics" tab of Logs pannel).

python bin/run_job_extract_nac.py \
    --runner DataflowRunner \
    --project klay-beam-tests \
    --service_account_email dataset-dataflow-worker@klay-beam-tests.iam.gserviceaccount.com \
    --region us-central1 \
    --max_num_workers 10 \
    --autoscaling_algorithm THROUGHPUT_BASED \
    --experiments use_runner_v2 \
    --sdk_location container \
    --temp_location gs://klay-dataflow-test-000/tmp/nac-test/ \
    --setup_file ./setup.py \
    --source_audio_path 'gs://klay-dataflow-test-000/glucose-karaoke/' \
    --nac_name dac \
    --nac_input_sr 44100 \
    --audio_suffix .wav \
    --machine_type n1-standard-16 \
    --number_of_worker_harness_threads=4 \
    --job_name 'extract-nac-test-dac-44100'

It would appear that dac tokens were still extracted successfully, but before running this job at scale I would want to better look into these.

klay_beam: v0.12.1 Docker image: us-docker.pkg.dev/klay-home/klay-docker/klay-beam:0.12.1-py3.9-beam2.51.0-torch2.0

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory [while running 'ExtractNAC/ParDo(ExtractDAC)-ptransform-71']
.__init__ ( /env/lib/python3.9/site-packages/torch/serialization.py:283 ) 

RuntimeError: PytorchStreamReader failed reading file data/192: file read failed
.load_tensor ( /env/lib/python3.9/site-packages/torch/serialization.py:1112 ) 

Out of memory: Killed process 2410 (python) total-vm:23895384kB, anon-rss:18693524kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:38072kB oom_score_adj:900 
mxkrn commented 10 months ago

Looks funky, I don't remember seeing this when I developed this. One thing I would like to point out though is that we currently have no intention of switching away from from Encodec to DAC, if anything I see us training our own Encodec.

CharlesHolbrow commented 9 months ago

Closing as wontfix, as we are not intending on returning to DAC.