google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.18k stars 718 forks source link

Fatal Python error: Segmentation fault #794

Closed baozg closed 5 months ago

baozg commented 6 months ago

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.6.1/docs/FAQ.md: Yes

Describe the issue: (A clear and concise description of what the issue is.)

Fatal Python error: Segmentation fault when make_examples

Setup

Steps to reproduce:

chr=$3 indir="01.mapping" outdir="02.snps" sif="dv-1.6.0.sif"

singularity exec -B ${indir}:/input -B ${outdir}:/output ${sif} /bin/bash -c "/opt/deepvariant/bin/run_deepvariant --model_type PACBIO --ref /input/ref.fa --reads /input/${sample}.sorted.bam --regions chr${chr} --output_vcf=/output/${sample}.chr${chr}.vcf.gz --output_gvcf=/output/${sample}.chr${chr}.g.vcf.gz --intermediate_results_dir=/output/${sample}_chr${chr} --num_shards=${threads} --sample_name=${sample}" rm -rf ${outdir}/${sample}_chr${chr}

  - Error trace: (if applicable)
  ```bash
Warning: The alignment path of one pair of sequences may miss a small part. [ssw.c ssw_align]
Warning: The alignment path of one pair of sequences may miss a small part. [ssw.c ssw_align]
Warning: The alignment path of one pair of sequences may miss a small part. [ssw.c ssw_align]
I0325 17:32:25.437496 47491250571072 make_examples_core.py:301] Task 0/48: 3061 candidates (3283 examples) [15.51s elapsed]
I0325 17:32:25.481451 47092596426560 make_examples_core.py:301] Task 3/48: 3479 candidates (3686 examples) [15.88s elapsed]
I0325 17:32:25.287480 47393598515008 make_examples_core.py:301] Task 1/48: 2217 candidates (2340 examples) [4.86s elapsed]
I0325 17:32:27.143459 47041007318848 make_examples_core.py:301] Task 44/48: 2558 candidates (2674 examples) [8.39s elapsed]
I0325 17:32:26.490880 46937528883008 make_examples_core.py:301] Task 32/48: 1393 candidates (1485 examples) [15.67s elapsed]
I0325 17:32:28.232726 47276001879872 make_examples_core.py:301] Task 28/48: 2137 candidates (2164 examples) [14.84s elapsed]
I0325 17:32:28.692107 47600061708096 make_examples_core.py:301] Task 36/48: 2092 candidates (2216 examples) [8.68s elapsed]
I0325 17:32:29.860347 47429922613056 make_examples_core.py:301] Task 33/48: 1802 candidates (2012 examples) [20.26s elapsed]
I0325 17:32:30.215045 47061051381568 make_examples_core.py:301] Task 12/48: 2012 candidates (2098 examples) [18.23s elapsed]
I0325 17:32:31.474687 47820084045632 make_examples_core.py:301] Task 15/48: 2297 candidates (2355 examples) [18.02s elapsed]
I0325 17:32:31.041316 47138252293952 make_examples_core.py:301] Task 13/48: 2101 candidates (2177 examples) [16.25s elapsed]
I0325 17:32:30.926676 47920224905024 make_examples_core.py:301] Task 4/48: 4191 candidates (4462 examples) [19.74s elapsed]
I0325 17:32:31.590019 47326413494080 make_examples_core.py:301] Task 10/48: 2048 candidates (2228 examples) [11.23s elapsed]
I0325 17:32:31.841506 47803793921856 make_examples_core.py:301] Task 27/48: 2098 candidates (2159 examples) [13.08s elapsed]
I0325 17:32:32.226495 47219228682048 make_examples_core.py:301] Task 18/48: 2173 candidates (2201 examples) [19.05s elapsed]
I0325 17:32:32.442558 47801829132096 make_examples_core.py:301] Task 2/48: 2877 candidates (2943 examples) [13.69s elapsed]
I0325 17:32:31.799957 47073207797568 make_examples_core.py:301] Task 6/48: 2254 candidates (2303 examples) [21.11s elapsed]
I0325 17:32:33.605938 47905241290560 make_examples_core.py:301] Task 34/48: 2396 candidates (2437 examples) [21.09s elapsed]
I0325 17:32:32.662033 47092596426560 make_examples_core.py:301] Task 3/48: 4231 candidates (4460 examples) [7.18s elapsed]
I0325 17:32:33.789848 46958121469760 make_examples_core.py:301] Task 7/48: 2480 candidates (2559 examples) [21.58s elapsed]
I0325 17:32:34.471721 47732710369088 make_examples_core.py:301] Task 45/48: 2297 candidates (2372 examples) [15.72s elapsed]
I0325 17:32:35.672929 47838957700928 make_examples_core.py:301] Task 21/48: 2294 candidates (2386 examples) [24.58s elapsed]
I0325 17:32:35.698938 47081923413824 make_examples_core.py:301] Task 25/48: 2581 candidates (2621 examples) [21.07s elapsed]
I0325 17:32:35.423843 47520100497216 make_examples_core.py:301] Task 46/48: 2664 candidates (2727 examples) [19.43s elapsed]
I0325 17:32:35.792955 47465615198016 make_examples_core.py:301] Task 26/48: 2413 candidates (2469 examples) [17.04s elapsed]
I0325 17:32:35.948103 47356236363584 make_examples_core.py:301] Task 16/48: 2434 candidates (2466 examples) [16.59s elapsed]
I0325 17:32:36.297509 46937528883008 make_examples_core.py:301] Task 32/48: 2295 candidates (2412 examples) [9.81s elapsed]
I0325 17:32:35.932620 47741062362944 make_examples_core.py:301] Task 11/48: 2105 candidates (2213 examples) [22.11s elapsed]
I0325 17:32:36.747299 47925058234176 make_examples_core.py:301] Task 24/48: 2282 candidates (2359 examples) [17.99s elapsed]
I0325 17:32:36.638497 46983220393792 make_examples_core.py:301] Task 37/48: 2171 candidates (2205 examples) [16.76s elapsed]
I0325 17:32:36.020885 47850531092288 make_examples_core.py:301] Task 5/48: 2523 candidates (2585 examples) [23.06s elapsed]
I0325 17:32:36.623855 47207681709888 make_examples_core.py:301] Task 20/48: 2171 candidates (2234 examples) [19.72s elapsed]
I0325 17:32:37.044504 47765069588288 make_examples_core.py:301] Task 14/48: 2153 candidates (2205 examples) [22.11s elapsed]
I0325 17:32:37.566646 47877669951296 make_examples_core.py:301] Task 17/48: 2618 candidates (2699 examples) [22.47s elapsed]
I0325 17:32:37.904620 47429922613056 make_examples_core.py:301] Task 33/48: 2488 candidates (2710 examples) [8.04s elapsed]
I0325 17:32:36.906824 46960045643584 make_examples_core.py:301] Task 43/48: 2532 candidates (2565 examples) [20.50s elapsed]
I0325 17:32:37.470283 47513251723072 make_examples_core.py:301] Task 29/48: 2293 candidates (2507 examples) [26.50s elapsed]
I0325 17:32:37.412583 47744955930432 make_examples_core.py:301] Task 22/48: 2906 candidates (3073 examples) [23.43s elapsed]
I0325 17:32:37.994984 47351217665856 make_examples_core.py:301] Task 42/48: 2130 candidates (2231 examples) [22.76s elapsed]
I0325 17:32:39.727152 47544041367360 make_examples_core.py:301] Task 40/48: 2378 candidates (2456 examples) [20.22s elapsed]
I0325 17:32:39.198227 47547672803136 make_examples_core.py:301] Task 41/48: 2742 candidates (2831 examples) [23.96s elapsed]
I0325 17:32:39.481850 47195669731136 make_examples_core.py:301] Task 47/48: 2478 candidates (2570 examples) [20.73s elapsed]
I0325 17:32:40.129924 47491250571072 make_examples_core.py:301] Task 0/48: 4098 candidates (4341 examples) [14.69s elapsed]
I0325 17:32:40.477149 47084640413504 make_examples_core.py:301] Task 39/48: 2025 candidates (2069 examples) [21.72s elapsed]
I0325 17:32:41.819239 47487413593920 make_examples_core.py:301] Task 38/48: 2581 candidates (2679 examples) [27.90s elapsed]
I0325 17:32:42.723748 47424450524992 make_examples_core.py:301] Task 19/48: 2809 candidates (2881 examples) [27.12s elapsed]
I0325 17:32:43.357604 47806383535936 make_examples_core.py:301] Task 23/48: 2286 candidates (2401 examples) [28.70s elapsed]
I0325 17:32:43.931203 47985428563776 make_examples_core.py:301] Task 35/48: 3282 candidates (3387 examples) [31.57s elapsed]
I0325 17:32:44.979849 47999988315968 make_examples_core.py:301] Task 31/48: 2600 candidates (2699 examples) [23.92s elapsed]
I0325 17:32:44.729335 47653137950528 make_examples_core.py:301] Task 30/48: 2895 candidates (3016 examples) [25.97s elapsed]
I0325 17:32:47.486382 47801829132096 make_examples_core.py:301] Task 2/48: 4049 candidates (4139 examples) [15.04s elapsed]
I0325 17:32:48.146358 47041007318848 make_examples_core.py:301] Task 44/48: 4691 candidates (4897 examples) [21.00s elapsed]
I0325 17:32:48.127754 47600061708096 make_examples_core.py:301] Task 36/48: 4081 candidates (4253 examples) [19.43s elapsed]
Fatal Python error: Segmentation fault

Current thread 0x00002b8260148740 (most recent call first):
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/realigner/realigner.py", line 882 in align_to_haplotype
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 2250 in align_to_all_haplotypes
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 2322 in <listcomp>
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 2321 in create_pileup_examples
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 1566 in writes_examples_in_region
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 2847 in make_examples_runner
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 224 in main
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/absl_py/absl/app.py", line 258 in _run_main
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/absl_py/absl/app.py", line 312 in run
  File "/tmp/Bazel.runfiles_30v6ynlb/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 234 in <module>
I0325 17:32:49.865826 47092596426560 make_examples_core.py:301] Task 3/48: 6125 candidates (6410 examples) [17.20s elapsed]
parallel: This job failed:

Does the quick start test work on your system? Please test with https://github.com/google/deepvariant/blob/r1.6/docs/deepvariant-quick-start.md. Is there any way to reproduce the issue by using the quick start?

Any additional context:

lucasbrambrink commented 6 months ago

Hi,

Thanks for brining up this issue. It'll be a bit tricky to debug this without having access to the files. Would it be possible to share the input files so we can try to reproduce this? Thanks!

baozg commented 6 months ago

Could you give me an email address and then I send you link of this chromosome data?

lucasbrambrink commented 6 months ago

Sure thing! You can send me the files at lucasbrambrink@google.com

Additionally, Seg faults can sometimes happen from OOMs (running out of memory). Do you have the memory specs of the instance you are running this on? Thanks!

baozg commented 6 months ago

It was run with 256G RAM node and all other samples finish in the same RAM nodes. I will send you data later


From: Lucas Brambrink @.> Sent: Tuesday, March 26, 2024 6:46:34 PM To: google/deepvariant @.> Cc: Zhigui Bao @.>; Author @.> Subject: Re: [google/deepvariant] Fatal Python error: Segmentation fault (Issue #794)

Sure thing! You can send me the files at @.**@.>

Additionally, Seg faults can sometimes happen from OOMs (running out of memory). Do you have the memory specs of the instance you are running this on? Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/google/deepvariant/issues/794#issuecomment-2021094547, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE5Y3VRRFLRCYTPDYDZDFY3Y2GX7VAVCNFSM6AAAAABFG7OINSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGA4TINJUG4. You are receiving this because you authored the thread.Message ID: @.***>

yangxin-9 commented 6 months ago

I also encountered the same problem, may I ask if it has been solved now? How to solve it?

kishwarshafin commented 6 months ago

@baozg and @yangxin-9 ,

Additionally to sending the bam files, can you please also see if the files are not truncated? You can run the following command to check if the files are OK:

samtools quickcheck -v *.bam > bad_bams.fofn   && echo 'all ok' || echo 'some files failed check, see bad_bams.fofn'
yangxin-9 commented 6 months ago

OK. I'll try that. Thank you for your reply.

yangxin-9 commented 6 months ago

I have checked my bam file according to the command you gave and it shows that 'all ok'. The error may not be caused by the bam file.

lucasbrambrink commented 5 months ago

@baozg

After carefully bisecting your BAM file, it looks like the region that throws an error is chr12:7721068-7735636.

Looking at the pileup, there are 5 large (~11k) deletions in that region of 3 different lengths: image

One is length 11,843, two are 11,844 and two are 11,845. It looks like the trouble comes from attempting to represent and realign those INDEL candidates with 2 reads each. DeepVariant can't actually call deletions that long.

If you set the vsc_min_count_indel to 3, the problem goes away. So adding --make_examples_extra_args=vsc_min_count_indels=3 should fix the issue. If desired, you can run DeepVariant on just that region with --regions=chr12:7721068-7735636

We will work on fixing this on our end as well in our next release.

@yangxin-9 To avoid mixing issues may or may not be related, please create a new issue that shows the command you ran and the output. Also, if possible, please send us the input files used so we can try to reproduce the issue ourselves.

baozg commented 5 months ago

Thanks for your careful examination. It's quite common to see this divergent region in outcrossing plants. It mixed with mapping noise and true variants. Is it possible to report this region / reads when realign fails? Or do I need pre exclude this region before DeepVariant calling?

lucasbrambrink commented 5 months ago

Right now DeepVariant does not have the ability to report such a region by itself and skip it. You will need to exclude the problematic regions before running DeepVariant, or use vsc_min_count_indels to avoid candidate generation in these cases.

baozg commented 5 months ago

Thank you so much. Now this sample runs smoothly.