hall-lab / svtyper

Bayesian genotyper for structural variants
MIT License
126 stars 55 forks source link

How to genotype many samples #62

Open 54tuifeimo opened 7 years ago

54tuifeimo commented 7 years ago

I have about 100 samples, and i I runned Lumpyexpress with -p and got the sample1.vcf, sample2.vcf, sample3.vcf, et al. lumpyexpress -B sample1.bam -S sample1.splitters.bam -D sample1.discordants.bam -P -o sample1.vcf

Then I used 1_sort.py and 1_merge.py to merge all the vcf files named samples.sorted.merge.vcf python l_sort.py sample1.vcf, sample2.vcf, sample3.vcf > samples.sorted.vcf python l_merge.py -i samples.sorted.vcf > samples.sorted.merge.vcf

I tried to run SVTyper to genotype each sample. svtyper -B sample1.bam -S sample1.splitters.bam -i samples.sorted.merge.vcf -M

sample1.gt.vcf

The errors ocurred. Should I extact each sample.vcf from the samples.sorted.merge.vcf again, then to run the SVTyper for each sample?Thanks for your help in advance!

Traceback (most recent call last): File "/WORK/app/bcbio//bin/svtyper", line 1413, in sys.exit(main()) File "/WORK/app/bcbio//bin/svtyper", line 1400, in main args.debug) File "/WORK/app/bcbio//bin/svtyper", line 1234, in sv_genotype out_bam) File "/WORK/app/bcbio//bin/svtyper", line 574, in count_pairedend mate_mapq = get_mate_mapq(sample.bam, read) # move this for speed File "/WORK/app/bcbio//bin/svtyper", line 367, in get_mate_mapq mq = bam.mate(read).mapq File "pysam/calignmentfile.pyx", line 1007, in pysam.calignmentfile.AlignmentFile.mate (pysam/calignmentfile.c:12008) ValueError: mate not found

dvanderleest commented 6 years ago

First of all the file names of the -B option need to be comma seperated ","; not comma space seperated (", "). You might want to use something similar to:

svtyper -B $(ls sample*.bam | paste -sd",") -i lumpy.raw.out.vcf > project.gt.vcf

paste -sd"," concatenates multiline input from stdin with as delimiter a comma

But I don't think this is what caused your error. The message says: "ValueError: mate not found" and it refers to getting the mate mapping quality. I think at least one of the reads in your sample.bam files lack a mate, while lumpy requires paired-end read data to determine some of the SV types.