google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.18k stars 718 forks source link

Error about expected cur_seq.size() < Max_READ_LEN #798

Closed OZTaekOppa closed 5 months ago

OZTaekOppa commented 6 months ago

Hi,

Thank you for the great program. I followed the steps in the DeepVariant PacBio model case study (https://github.com/google/deepvariant/blob/r1.6.1/docs/deepvariant-pacbio-model-case-study.md) and generated a BAM file as below.

minimap2 -t 8 -ax map-hifi --secondary=no -Y -R '@RG\tID:N006942-GRCh38\tSM:ELD144989A-D01\tPU:ELD144989A-D01-CCS' -r7k --MD /data/GRCh38_no_alt_analysis_set.fasta / data/N006942-20231016.fq.gz | samtools sort -@ 8 -O BAM -o ./N006942-20231016.bam samtools index ./N006942-20231016.bam

However, when I tried to “Run DeepVariant on chromosome 20 alignments” as described, I encountered unrecognized parameters and had to modify the command for PBSPro as below.

PBS -q gpuvolta

PBS -l ncpus=48

PBS -l ngpus=4

PBS -l mem=384GB

module load singularity module load parabricks/4.2.1

REF_FA='/data/GRCh38_no_alt_analysis_set.fasta' INPUT_BAM='/data/N006942-20231016.bam' OUTPUT_VCF='/data/test_output.vcf'

Run DeepVariant

ulimit -u 100000 singularity run /apps/parabricks/4.2.1/image/clara-parabricks_4.2.1-1.sif pbrun deepvariant \ --ref ${REF_FA} \ --in-bam ${INPUT_BAM} \ --out-variants ${OUTPUT_VCF} \ --run-partition \ --num-cpu-threads-per-stream 12 \ --gpu-num-per-partition 1

I'm now facing an error related to cur_seq.size() < MAX_READ_LEN.

[PB ^[[31mError^[[0m 2024-Mar-30 15:00:13][src/region.cpp:3442] Too many sequences - 17549 (max 512, expected cur_seq.size() < MAX_READ_LEN, exiting. [PB ^[[31mError^[[0m 2024-Mar-30 15:00:13][src/ssw_gpu.cu:439] cudaSafeCall() failed: driver shutting down, exiting

Could you provide any insights or suggestions on this issue?

Cheers,

lucasbrambrink commented 6 months ago

It looks like you are using PacBio data, but not setting the PB --mode flag to pacbio (docs). The default for that flag is shortread which will not work with the long reads in PacBio data.

OZTaekOppa commented 5 months ago

Thanks. It worked well.