hall-lab / speedseq

A flexible framework for rapid genome analysis and interpretation
MIT License
311 stars 116 forks source link

SVTyper fails to genotype with hg38 alignments. --split_bam (-S) is deprecated #117

Open dantaki opened 6 years ago

dantaki commented 6 years ago

I am using Version: 0.1.2 of speedseq and version: v0.1.4 of svtyper

I have 3 bam files aligned with bwa mem to the hg38 reference. This is my speedseq command

speedseq sv \
-B NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam,NA19239.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam,NA19240.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam \
-D NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_disc.bam,NA19239.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_disc.bam,NA19240.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_disc.bam \
-S NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam,NA19239.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam,NA19240.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam \
-R GRCh38_full_analysis_set_plus_decoy_hla.fa \
-o hg38_yri \
-x GRCh38-centromere-gaps-segdups-combined.bed \
-g \
-t 8

I generated the discordant and split read files with the following commands

samtools view -bh -@ 8 -F 1294 $BAM | samtools sort -@ 8 -o $DISC_BAM

samtools view -@ 8 -h $BAM | /home/usr/bin/speedseq/src/lumpy-sv/scripts/extractSplitReads_BwaMem -i stdin | samtools view -@ 8 -b | samtools sort -@ 8 -o $SPLIT_BAM

I've ran the speedseq command twice, each using a different exclusion file. The exclusion file above is hg38 centromeres, assembly gaps, and segmental duplications.

The second exclusion file was a hg38 lift-over from the hg19 lumpy exclusion file packaged in speedseq.

I get the same error for both jobs.

Here is the error message:

Warning: --split_bam (-S) is deprecated. Ignoring NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam.
Calculating library metrics from NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam... done
slurmstepd: *** JOB 11382902 ON XXXX CANCELLED AT 2017-10-01T11:44:24 DUE TO TIME LIMIT ***

Note that the wall time limit was 48 hours.

Here's the STDOUT

LUMPY Express done
# genotype structural variants
python2.7 /home/usr/bin/speedseq/bin/svtyper -q -i hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.vcf -B NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam -S NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam > hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.gt.vcf ; mv hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.gt.vcf hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.vcf

Here's an entry from SVTyper output

chr1    20712560        105     N       <DEL>   9.08    .       SVTYPE=DEL;SVLEN=-179;END=20712739;STRANDS=+-:5;CIPOS=-10,9;CIEND=-10,9;CIPOS95=0,0;CIEND95=0,0;SU=5;PE=0;SR=5  GT:SU:PE:SR:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB  0/1:4:0:4:9:9.08:-14,-13,-59:78:69:8:69:8:69:5:2:0:0:0.1        ./.:0:0:0:.:.:.:.:.:.:.:.:.:.:.:.:.:.   ./.:1:0:1:.:.:.:.:.:.:.:.:.:.:.:.:.:.

Note that only one sample is genotyped. This is the case for all variants in the VCF. Only the first sample is genotyped.

I have used these commands successfully in the past with hg19 aligned genomes without error. I'm not sure what's causing the error messages I'm receiving.

Thank you for your time

Sithara85 commented 6 years ago

Hi, We are trying to use Speedseq SV call for our whole genome analysis pipeline. We processed the HapMap sample NA12877. I am getting the same warning when I run speedseq sv on the bam outputs from speedseq align "Warning: --split_bam (-S) is deprecated. Ignoring NA12877_10X.splitters.bam." Is this a known issue?

Thank you, Sithara

Jia21 commented 4 years ago

Hi,

I met the same warning issue. I am wondering whether it's a big issue for CNV calling.

Thanks!

Elaine