mcfrith / tandem-genotypes

GNU General Public License v3.0
45 stars 7 forks source link

"the alignments are in the wrong order" before & after maf-sort #17

Closed davidlougheed closed 2 years ago

davidlougheed commented 2 years ago

I'm getting the error tandem-genotypes: error: the alignments are in the wrong order when I run tandem-genotypes. I've ran the .maf file through maf-sort but the error persists. Any ideas on what could cause this?

mcfrith commented 2 years ago

maf-sort sorts them into a wrong order: it sorts the alignments by their location in the genome, whereas they should be sorted in query-sequence order.

They should be in the right order to start with, I have no idea why not... how exactly did you get the .maf?

davidlougheed commented 2 years ago

thanks for the prompt response!

the maf is generated with something like this:

bedtools bamtofastq -i "${bam}" -fq "${scratch}/${fq}"
last-train -P32 -Q0 hg38db "${scratch}/${fq}" > "${scratch}/${par}"
lastal -P32 -p "${scratch}/${par}" hg38db "${scratch}/${fq}" | last-split > "${scratch}/${maf}"

where bam/fq/par/maf are variables pointing to different files

the weird part of this workflow I guess is that I'm converting CCS reads from a BAM back to a fastq file; I've been able to do this with some individuals from the 1000Genomes dataset (which worked) and now I'm trying it on HG002 from Genome in a Bottle.

mcfrith commented 2 years ago

That seems like it should work fine, and the alignments should be in the right order. Scratching my head...

By the way, this should sort them into the right order (though if 2 reads have the same name, we have bigger problems):

maf-swap myfile.maf | maf-sort | maf-swap > out.maf
davidlougheed commented 2 years ago

lastdb was ran like this:

lastdb -P32 -uNEAR hg38db ./ref/hg38.analysisSet.fa

if the DB could cause it, it's possible that I need to regenerate it for GiaB's reference FASTA file – they are both hg38, but it might be slightly different vs. 1000 genomes I suppose.

I'll take a look into those other possibilities as well!

davidlougheed commented 2 years ago

ah I found the issue - there do seem to be duplicates of each read in the FASTQ file for some reason, so I'll mark the issue as closed. thanks for all the help!

edit - if anyone in the future comes across this, using Picard's SAM to FASTQ instead of bedtools seems to fix the issue

mcfrith commented 2 years ago

Thanks for sharing the solution!