Closed liuqianhn closed 4 years ago
Hi @liuqianhn,
In theory, a BAM file can contain reads for many samples. So in default mode, HipSTR requires that each read has RG information, which it then uses to split up reads by samples prior to genotyping.
So unfortunately, even if you only have 1 BAM, it either must have i) RG information for each read or ii) you must use the --bam-samps and --bam-libs options. In your case, using the latter would look something like this:
HipSTR --bams HG001.GRCh38_full_plus_hs38d1_analysis_set_minus_alts.300x.bam --fasta hg38.fa --regions rep4.bed --str-vcf hip_4_NA.vcf.gz --log log/hip_4_NA.log --bam-samps HG001 --bam-libs HG001
If you have more than 1 sample, I'd recommend genotyping them all in the same command, as it'll improve the genotypes due to allele discovery and it'll make it easier to compare across samples.
Since you only have 1 sample, I'd also recommend using the default stutter model by specifying the flag --def-stutter-model
and you'll likely need to lower the minimum read requirement using something like --min-reads 30
See here for more details on these options
Best, Thomas
Hi @tfwillems , thank you very much for your reply.
May I know a little more about BPDIFFS
in the vcf output file? I knew it is "Base pair difference of each alternate allele from the reference allele". I thus guess, if there is no BPDIFFS
for some lines, it means that both alleles are same as the reference. Please correct me if I am wrong. Thank you.
If there is no BPDIFFS in the INFO field, it's because there are no ALT alleles detected and the ALT column should therefore be ".". This situation occurs if no variation is detected in your samples, which is far more likely to occur if you're only genotyping 1 sample at a time
Hi Developer, I am using your nice tool to detect repeats on short-read data. However, I have the following error. Could you please advise whether read groups/bam-samps is necessary? I only have one sample.
My command is