Closed husamia closed 1 year ago
Hi @husamia, does the reference genome use 17
instead of chr17
? The bam file looks like it uses the chr17 convention, but not the reference? check with head Homo_sapiens_assembly19_1000genomes_decoy.fasta.fai
@kcleal I've tried replacing chr17 and just 17 in the bed file and it didn't help. I even used samtools view -b chr17 original.bam > chr17.bam and removed the --search and still getting the same error!
I think the bam file is consistent with chr17 - the rest of the pipeline works ok. I was wondering about the reference genome. The error message is saying that chr17 is not in the fasta file. Tne functionref_genome.get_reference_length
is throwing the key error. Could you confirm this?
@kcleal here is the check
head /mnt/d/Research/Homo_sapiens_assembly19_1000genomes_decoy.fasta.fai 1 249250621 52 100 101 2 243199373 251743232 100 101 3 198022430 497374651 100 101 4 191154276 697377358 100 101 5 180915260 890443229 100 101 6 171115067 1073167694 100 101 7 159138663 1245993964 100 101 8 146364022 1406724066 100 101 9 141213431 1554551781 100 101 10 135534747 1697177401 100 101
Ah ok, thats the problem - the 'chr' is missing from the chromosome names. You can add this in using sed/awk, and then re-index the genome. It should read chr1 249250621 52 100 101
. Alternatively, you should probably use the same reference genome that the sample was aligned to. You can normally check using samtools view -H
and look at the command used during mapping along with the reference genome.
@kcleal the reference used is the generic hg19.fasta which I have two copies of downloaded from different sources and both are the same. furthermore, I can open the file in IGV and the chr17 file created with samtools in IGV as well
I think IGV does automatic conversion between the two representations. This is not supported in dysgu unfortunately
@kcleal this is an issue.
the reference is generic named hg19.fasta which is not provided with the data. I used the reference with many other tools just fine. so can a feature added to do the conversion. Provide me me the awk/sed command to add the chr to the file?
Its a problem I have encountered before with other genomics analysis, the different representations can be a pain. However doing automatic conversion can also cause problems. I recommend you download the hg19 with the chr representation from ucsc TableBrowser.
Actually from here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/
@kcleal I got it working after I downloaded the reference, uncompressed it, compressed it with bgzip and indexed it with samtools.
I am getting an error with the command --search that I can't figure out
this is 30x WGS data. I want to get SVs from only chromosome 17 so I provided the bed file and used the --search PATH .bed file, limit search to regions option