tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

Question HipSTR + reference genome hg19 + ancientDNA #86

Open arenvale opened 2 years ago

arenvale commented 2 years ago

Hi Thomas! Thanks for the tool, I think it will be very useful to me. I'm fairly new to this, so maybe my questions are a bit silly. I looked to see if anyone else had already asked this question but didn't find it. I am trying to run it to get CODIS STRs from human whole genomes. However I am having some problems. I downloaded the hg19 reference genome, but the fasta contains what, from what I understand, are the chromosome sequences, plus other regions of each chromosome (patches/alternate locus group/unlocalized genomic contig). When I wanted to run HipSTR with this reference fasta I got an error because the chromosome names in the .bed did not correspond to those in the fasta, so I unified them. Now I got it to run but I could not recover any STR from any chromosome:

./HipSTR --bams CO001.bam --fasta hg19_refgenome.fa --regions str_codis-chrY_hg19.bed --str-vcf str_calls.vcf.gz --bam-samps CO001 --bam-libs CO001
Detected 1 BAM/CRAM files
User-specified read groups for 1 unique samples
Reading region file str_codis-chrY_hg19.bed
Region file contains 30 regions

Processing region 11 2192317 2192345
0 reads overlapped region, of which 
    0 were hard clipped
    0 had an 'N' base call
    0 had low base quality scores
    0 did not have a unique mapping
    0 did not have a mate pair
    0 PASSED ALL FILTERS
Found 0 fully paired reads and 0 unpaired reads for downstream analyses
Removed 0 sets of PCR duplicate reads
Phased SNPs add info for 0 out of 0 reads and 0 out of 0 samples
Skipping locus with too few reads: TOTAL=0, MIN=100

I don't know if I am using a fasta reference genome that is not the correct one, or what the problem is. I hope I have explained well. And I would like to ask you another question: do you know if there are any restrictions on using this tool with ancient whole genomes? Thank you very much for your help! Valeria