Closed HLHsieh closed 3 months ago
Hi Hsin,
It seems that your locus is skipped because all overlapping reads had low quality, you can decrease the threshold by specifying a value for --min-mean-qual, the default value is 30.
Best, Helia
Hi Helia,
Thank you for your quick response. I was trying the following command, but the same error message returned.
LongTR --bams C9ORF72_1_10R.sorted.bam --fasta $dir/genome.fa --regions $dir/test.bed --tr-vcf $dir/test.vcf.gz --phased-bam --bam-samps C9ORF72_1 --bam-libs C9ORF72_1 --min-mean-qual 0 --min-reads 1
Detected 1 BAM/CRAM files
User-specified read groups for 1 unique samples
Reading region file /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.bed
Region file contains 1 regions
Processing region chr9 27573528 27573546
111 reads overlapped region, of which
0 were hard clipped
0 had an 'N' base call
0 had low MAPQ
111 had low base quality scores
0 did not span the STR
0 did not have a unique mapping
0 PASSED ALL FILTERS
Phased SNPs add info for 0 out of 0 reads and 0 out of 0 samples
Skipping locus with too few reads: TOTAL=0, MIN=1
------HipSTR Execution Summary------
Skipped 1 loci with too few reads for stutter model model training or genotyping.
If this is a sizeable portion of your loci, see the --min-reads command line option
Genotyping succeeded for 0/0 loci
Approximate timing breakdown
BAM seek time = 0.006804 seconds
Read filtering = 0.065036 seconds
SNP info extraction = 4.3e-05 seconds
Genotyping = 0 seconds
Trimming alignment = 0 seconds
Haplotype generation = 0 seconds
Haplotype alignment = 0 seconds
LongTR execution finished: Total runtime = 3.1865 sec
-----------------
Any suggestions would be appreciated.
Many thanks, Hsin
I have had to specify "-1" for that parameter to avoid removing all (nanopore) reads.
Hi,
Thanks. Specified "-1" for that parameter can include all reads, but no repeats were detected in the output file.
Log:
Detected 1 BAM/CRAM files
User-specified read groups for 1 unique samples
Reading region file /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.bed
Region file contains 1 regions
Processing region chr9 27573528 27573546
111 reads overlapped region, of which
0 were hard clipped
0 had an 'N' base call
0 had low MAPQ
0 had low base quality scores
0 did not span the STR
0 did not have a unique mapping
111 PASSED ALL FILTERS
Phased SNPs add info for 0 out of 111 reads and 0 out of 1 samples
Trimming reads
Failed to trim align 6 out of 111 reads
Generating candidate haplotypes
GCCCCGGGCCCGCCCCCGGGCCCGCCCCGACCACG CCCCGGCCCCGGCCCCGGCCCC TAGCGCGCGACTCCTGAGTTCCAGAGCTTGCTACA
GCCCCGGCCCCGGCCCCGGCCCC
CCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCC
GCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGC...GCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCC
Added 2 inexact haplotypes generated by POA
Aborting genotyping of the locus as the sequence upstream of the repeat is too repetitive for accurate genotyping
Flanking sequence = GCCCCGGGCCCGCCCCCGGGCCCGCCCCGACCACG
Locus timing:
BAM seek time = 0.0069 seconds
Read filtering = 0.085952 seconds
SNP info extraction = 0.005669 seconds
Stutter estimation = 4e-06 seconds
Genotyping = 0.067338 seconds
Trim alignment = 0.057342 seconds
Haplotype generation = 0.009429 seconds
Haplotype alignment = 0 seconds
------HipSTR Execution Summary------
Genotyping succeeded for 0/1 loci
Approximate timing breakdown
BAM seek time = 0.0069 seconds
Read filtering = 0.085952 seconds
SNP info extraction = 0.005669 seconds
Genotyping = 0.067338 seconds
Trimming alignment = 0.057342 seconds
Haplotype generation = 0.009429 seconds
Haplotype alignment = 0 seconds
LongTR execution finished: Total runtime = 3.27426 sec
-----------------
VCF file
##fileformat=VCFv4.1
##command=LongTR-638942f-dirty --bams C9ORF72_1_10R_NanoSim_50x.sorted.bam --fasta /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/genome.fa --regions /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.bed --tr-vcf /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.vcf.gz --bam-samps C9ORF72_1 --bam-libs C9ORF72_1 --min-mean-qual -1 --min-reads 1
##reference=/scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/genome.fa
##contig=<ID=chrM,length=16569>
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
##INFO=<ID=INEXACT_ALLELE,Number=A,Type=Integer,Description="Boolean showing if each alternate allele is exact or approximated by POA, 0 for exact 1 for approximated.">
##INFO=<ID=BPDIFFS,Number=A,Type=Integer,Description="Base pair difference of each alternate allele from the reference allele">
##INFO=<ID=START,Number=1,Type=Integer,Description="Inclusive start coodinate for the repetitive portion of the reference allele">
##INFO=<ID=END,Number=1,Type=Integer,Description="Inclusive end coordinate for the repetitive portion of the reference allele">
##INFO=<ID=PERIOD,Number=1,Type=Integer,Description="Length of STR motif">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=REFAC,Number=1,Type=Integer,Description="Reference allele count">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele counts">
##INFO=<ID=NSKIP,Number=1,Type=Integer,Description="Number of samples not genotyped due to various issues">
##INFO=<ID=NFILT,Number=1,Type=Integer,Description="Number of samples whose genotypes were filtered due to various issues">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total number of valid reads used to genotype all samples">
##INFO=<ID=DSNP,Number=1,Type=Integer,Description="Total number of reads with SNP phasing information">
##INFO=<ID=DFLANKINDEL,Number=1,Type=Integer,Description="Total number of reads with an indel in the regions flanking the STR">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GB,Number=1,Type=String,Description="Base pair differences of genotype from reference">
##FORMAT=<ID=Q,Number=1,Type=Float,Description="Posterior probability of unphased genotype">
##FORMAT=<ID=PQ,Number=1,Type=Float,Description="Posterior probability of phased genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of valid reads used for sample's genotype">
##FORMAT=<ID=DSNP,Number=1,Type=Integer,Description="Number of reads with SNP phasing information">
##FORMAT=<ID=PSNP,Number=1,Type=String,Description="Number of reads with SNPs supporting each haploid genotype">
##FORMAT=<ID=PDP,Number=1,Type=String,Description="Fractional reads supporting each haploid genotype">
##FORMAT=<ID=GLDIFF,Number=1,Type=Float,Description="Difference in likelihood between the reported and next best genotypes">
##FORMAT=<ID=DFLANKINDEL,Number=1,Type=Integer,Description="Number of reads with an indel in the regions flanking the STR">
##FORMAT=<ID=AB,Number=1,Type=Float,Description="log10 of the allele bias pvalue, where 0 is no bias and more negative values are increasingly biased. 0 for all homozygous genotypes">
##FORMAT=<ID=FS,Number=1,Type=Float,Description="log10 of the strand bias pvalue from Fisher's exact test, where 0 is no bias and more negative values are increasingly biased. 0 for all homozygous genotypes">
##FORMAT=<ID=DAB,Number=1,Type=Integer,Description="Number of reads used in the AB and FS calculations">
##FORMAT=<ID=ALLREADS,Number=1,Type=String,Description="Base pair difference observed in each read's Needleman-Wunsch alignment">
##FORMAT=<ID=MALLREADS,Number=1,Type=String,Description="Maximum likelihood bp diff in each read based on haplotype alignments for reads that span the repeat region by at least 5 base pairs">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT C9ORF72_1
My test bed file:
chr9 27573529 27573546 6 3 C9orf72
Any suggestions would be appreciated.
Many thanks, Hsin
Hi Hsin,
You are getting this error because LongTR is trying to assemble the flanking sequence, which is unnecessary with long reads, we recommend adding --skip-assembly to the command when using long reads to avoid this step.
Best, Helia
Hi there,
After your suggestions, I can run LongTR on my data as follows:
Message showed:
Therefore, no repeats were detected. This data should contain some repeat regions. I am wondering if there is something wrong with the command or if you have any suggestions on how I can adjust it accordingly.
Many thanks, Hsin