Deepsomatic/1.7.0 make_examples_somatic gnu parallel failure

skchronicles commented 1 month ago

Hello there,

Thank you for creating and maintaining this amazing tool. The latest version of deepsomatic looks awesome!

I have been testing the latest docker image of deepsomatic (docker://google/deepsomatic:1.7.0) against the SEQC2 tumor-normal pair to evaluate the tool's precision/recall against a truthset. While running the tool, I have run into an error during the make_examples_somatic for one of the intermediate shards.

Here is relevant traceback right before the tool errors out:

I1028 19:10:28.773802 23456244236288 make_examples_core.py:321] Task 22/24: Writing example info to /lscratch/39497410/tmp.0on
V2HbkFb/make_examples_somatic.tfrecord-00022-of-00024.gz.example_info.json
I1028 19:10:28.774132 23456244236288 make_examples_core.py:3189] example_shape = [200, 221, 7]
I1028 19:10:28.774754 23456244236288 make_examples_core.py:3190] example_channels = [1, 2, 3, 4, 5, 6, 19]
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1730157028.776144 1500549 make_examples_native.h:179] Received shard finished.
I1028 19:10:28.776247 23456244236288 make_examples_core.py:321] Task 22/24: Found 1305562 candidate variants
I1028 19:10:28.776323 23456244236288 make_examples_core.py:321] Task 22/24: Created 1389857 examples
I1028 19:10:28.776376 23456244236288 make_examples_core.py:321] Task 22/24: Created 0 small model examples
I1028 19:10:28.776423 23456244236288 make_examples_core.py:321] Task 22/24: Small Model called 0 candidates
I1028 19:10:29.265839 23456244236288 make_examples_core.py:321] Task 17/24: 852009 candidates (904082 examples) [64.94s elapse
d]
parallel: This job failed:
/opt/deepvariant/bin/make_examples_somatic --mode calling --ref /path/to/refs//Homo_sapiens_assembly38
.fasta --reads_tumor /path/to/outputs/BAM/WGS_NCI_T_1_1ug_S1.sorted.bam --reads_no
rmal /path/to/outputs/BAM/WGS_NCI_N_1_1ug_S2.sorted.bam --examples '/lscratch/3949
7410/tmp.0onV2HbkFb/make_examples_somatic.tfrecord@24.gz' --checkpoint /opt/models/deepsomatic/wgs --sample_name_normal WGS_NC
I_N_1_1ug_S2 --sample_name_tumor WGS_NCI_T_1_1ug_S1 --vsc_max_fraction_indels_for_non_target_sample 0.5 --vsc_max_fraction_snp
s_for_non_target_sample 0.5 --vsc_min_fraction_indels 0.05 --vsc_min_fraction_snps 0.029 --task 22

real    436m36.209s
user    1280m57.712s
sys 10m21.758s
450G    /lscratch/39497410/tmp.0onV2HbkFb

I am also attaching the full log file too. Please let me know what you think, and have a great day! slurm-12345.txt

Here is the deepsomatic command that was run:

# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "/lscratch/$SLURM_JOBID/" ]; then mkdir -p "/lscratch/$SLURM_JOBID/"; fi
tmp=$(mktemp -d -p "/lscratch/$SLURM_JOBID/")
trap 'rm -rf "${tmp}"' EXIT
run_deepsomatic \
    --model_type=WGS \
    --ref=/path/to/refs/Homo_sapiens_assembly38.fasta \
    --reads_tumor=/path/to/output/BAM/WGS_NCI_T_1_1ug_S1.sorted.bam --reads_normal=/path/to/output/BAM/WGS_NCI_N_1_1ug_S2.sorted.bam \
    --sample_name_tumor=WGS_NCI_T_1_1ug_S1 --sample_name_normal=WGS_NCI_N_1_1ug_S2 \
    --output_vcf=/path/to/output/deepsomatic/somatic/WGS_NCI_T_1_1ug_S1.deepsomatic.vcf \
    --num_shards=24 \
    --intermediate_results_dir=${tmp} 

du -sh "${tmp}"

Best regards, @skchronicles

kishwarshafin commented 1 month ago

I am not super familiar with slurm but it seems like task 22 failed right before it was going to write the examples. Any chance you are removing the intermediate_results_dir during the program termination? Can you save it somewhere outside of tmp and see if it works? Also, you can try out the quickstart which should be super quick to run: https://github.com/google/deepsomatic/blob/r1.7/docs/deepsomatic-quick-start.md to see if the issue is with DeepSomatic or with your slurm setup.

skchronicles commented 1 month ago

Okay, that sounds good. I was thinking it was running out of disk space too for the --intermediate_results_dir. That is pointing to local node storage that you allocate through the job scheduler. I was allocating 1200GB of space and I was adding some extra commands to check for disk space usage on failure or exit. After reviewing the logs, it appeared to have only used around 400-500GB of disk space, but I will point to another output directory instead. I will keep you in the loop, and I will let you know if that resolves the issue.

I have another colleguage that was able to run the same command on another SLURM cluster. One interesting thing in the log file that I attached is some issues related to thread creation. Have you run into this issue before?

skchronicles commented 1 month ago

Hey @kishwarshafin,

I hope you are having a great day! I just wanted to provide an update as I was able to test two different things on my side.

I ran the same deepsomatic command using the same input BAM files but I changed the location of the --intermediate_results_dir option to point to more persistent storage with an excess of space (20TB).

run_deepsomatic \
--model_type=WGS \
--ref=/path/to/refs/Homo_sapiens_assembly38.fasta \
--reads_tumor=/path/to/output/BAM/WGS_NCI_T_1_1ug_S1.sorted.bam --reads_normal=/path/to/output/BAM/WGS_NCI_N_1_1ug_S2.sorted.bam \
--sample_name_tumor=WGS_NCI_T_1_1ug_S1 --sample_name_normal=WGS_NCI_N_1_1ug_S2 \
--output_vcf=/path/to/output/deepsomatic/somatic/WGS_NCI_T_1_1ug_S1.deepsomatic.vcf \
--num_shards=24 \
--intermediate_results_dir=/path/to/output/deepsomatic/somatic/WGS_NCI_T_1_1ug_S1_tmp

After updating the --intermediate_results_dir to point to another location (persistent disk with an over-abundance of storage), I am still running into the same error. The make_examples_somatic command is failing for another chunk (now it is chunk/task 6).

I ran the same deepsomatic as before with the same input BAM files but prior to running deepsomatic, I exported an OPENBLAS environment variable.

# NEW: export OpenBLAS variable to
# to prevent issues related to pthread init
export OPENBLAS_NUM_THREADS=1

# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "/lscratch/$SLURM_JOBID/" ]; then mkdir -p "/lscratch/$SLURM_JOBID/"; fi
tmp=$(mktemp -d -p "/lscratch/$SLURM_JOBID/")
trap 'rm -rf "${tmp}"' EXIT
run_deepsomatic \
    --model_type=WGS \
    --ref=/path/to/refs/Homo_sapiens_assembly38.fasta \
    --reads_tumor=/path/to/output/BAM/WGS_NCI_T_1_1ug_S1.sorted.bam --reads_normal=/path/to/output/BAM/WGS_NCI_N_1_1ug_S2.sorted.bam \
    --sample_name_tumor=WGS_NCI_T_1_1ug_S1 --sample_name_normal=WGS_NCI_N_1_1ug_S2 \
    --output_vcf=/path/to/output/deepsomatic/somatic/WGS_NCI_T_1_1ug_S1.deepsomatic.vcf \
    --num_shards=24 \
    --intermediate_results_dir=${tmp}

This appears to have resolved/fixed the issue. The make_examples_somatic ran without any errors, and the warning messages related to pthread creation were not any the log file. With that being said, I think the issue is somehow related to pthread/dthread creation. I was able to run deepsomatic without any issues by exporting the following environment variable prior to running deepsomatic: export OPENBLAS_NUM_THREADS=1. With that being said, this issue may be system-specific though as the HPC system I am using has hyperthreading enabled.

Please let me know what you think.

Best regards, @skchronicles

kishwarshafin commented 1 month ago

@skchronicles ,

I am not experienced in SLURM or the environment so I can't exactly tell you if that's the case. But it seems like you figured it out correctly and it possibly is a system specific issue. If you have a SLURM manager/contact, I think it'd be best to run it by them as they would know more about the environment.

skchronicles commented 1 month ago

Hey @kishwarshafin,

I just checked up on a running job, and it appears that once call_variants starts running CPU usage spikes up quite a bit over the 24 threads/shards that were allocated.

Here was the command that was run:

# NEW: export OpenBLAS variable to
# to prevent issues related to pthread init
export OPENBLAS_NUM_THREADS=1

# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "/lscratch/$SLURM_JOBID/" ]; then mkdir -p "/lscratch/$SLURM_JOBID/"; fi
tmp=$(mktemp -d -p "/lscratch/$SLURM_JOBID/")
trap 'rm -rf "${tmp}"' EXIT
run_deepsomatic \
    --model_type=WGS \
    --ref=/path/to/refs/Homo_sapiens_assembly38.fasta \
    --reads_tumor=/path/to/output/BAM/WGS_NCI_T_1_1ug_S1.sorted.bam --reads_normal=/path/to/output/BAM/WGS_NCI_N_1_1ug_S2.sorted.bam \
    --sample_name_tumor=WGS_NCI_T_1_1ug_S1 --sample_name_normal=WGS_NCI_N_1_1ug_S2 \
    --output_vcf=/path/to/output/deepsomatic/somatic/WGS_NCI_T_1_1ug_S1.deepsomatic.vcf \
    --num_shards=24 \
    --intermediate_results_dir=${tmp}

The call_variants step starts to run around the 5AM looking at the log file, and if I check CPU usage for the running job that is also when CPU usage spikes: Log file Job Dashboard

I was hoping that setting that environment variable would prevent any instances of nested parallelism but it appears that may still be happening with call_variants. If you look at the CPU spikes, the peaks roughly appear to be around 2*num_shards. Have you ever observed this before, and does the --num_shards=24 roughly correlate with the max number of threads/processes spawned by deepsomatic?

Please let me know what you think.

Best regards, @skchronicles

kishwarshafin commented 1 month ago

@skchronicles , the num_shards only controls number of CPU used for make_examples and postprocess_examples stages. For call_variants the tensorflow API tries to use all resources available as that is most optimal. If you are using docker then you can add --cpus 32 to limit the CPU usage. For SLURM, please set the variable that limits the number of CPU for your job.

skchronicles commented 4 weeks ago

Hey @kishwarshafin,

I hope you had a great weekend. Thank you for the quick and insightful response!

For call_variants the tensorflow API tries to use all resources available as that is most optimal.

Would it make sense to limit the number of threads spawned by the tensorflow API within the call_variants command? It appears this may be possible (even with environment variables); however, I have never tested it out: https://www.tensorflow.org/api_docs/python/tf/config/threading

Please let me know what you think.

Best regards, @skchronicles

google / deepsomatic

Deepsomatic/1.7.0 make_examples_somatic gnu parallel failure #28