Closed HLHsieh closed 6 months ago
Thanks for your interest in tandem-genotypes. Of course I've forgotten how it predicts alleles :confused:, but a code comment says it uses k-medoids clustering with k=2.
If there are just 3 reads with different repeat counts, I think it's impossible to predict 2 alleles reliably. It just picks 2 of them in that case.
tandem-genotypes-merge
assigns reads to alleles. It simply assigns each read to the allele with the closest repeat count.
Thank you for your quick response. I was trying to run tandem-genotypes-merge
as follows:
last-train -P8 -Q0 --postmask=0 ${mydb_dir}/mydb ${sample}.fastq > ${mydb_dir}/${sample}.par
lastal -P8 --split -p ${mydb_dir}/${sample}.par ${mydb_dir}/mydb ${sample}.fastq > ${mydb_dir}/${sample}.maf
tandem-genotypes -v -o2 --postmask=0 -g ${ref_dir}/refGene.txt $predefined ${mydb_dir}/${sample}.maf > tandem-genotypes/${sample}_allele.txt
tandem-genotypes-merge ${input_dir}/${sample}.fastq ${mydb_dir}/${sample}.maf tandem-genotypes/${sample}_allele.txt > tandem-genotypes/${sample}.merged.fa
The error message shows:
lamassemble: error: can't read the last-train data
lamassemble: error: can't read the last-train data
I have no idea why the last-train data can not be read. Any advice would be appreciated.
Best, Hsin
You need to give tandem-genotypes-merge
your .par
instead of .maf
.
Oh, sorry that I did not notice this.
After correcting to
tandem-genotypes-merge ${input_dir}/${sample}.fastq ${mydb_dir}/${sample}.par tandem-genotypes/${sample}_allele.txt > tandem-genotypes/${sample}.merged.fa
There is still error message:
lastal: can't interpret: 30g
Traceback (most recent call last):
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 756, in <module>
main(opts, *args)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 701, in main
assemble(opts, scale, probMatrix, gapProbs, sequences, tmpdir)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 595, in assemble
alignments = pairwiseAlignments(opts, *tmpFileNames)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 571, in pairwiseAlignments
addStrandAlignments(alignments, opts, db, seqFile, fwdMat, 1)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 567, in addStrandAlignments
raise subprocess.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '('lastal', '-s1', '-j4', '-D1e9', '-P1', '-m5', '-z30g', '-p', '/tmp/lamassembleC8MgEn/fwd.mat', '/tmp/lamassembleC8MgEn/db', '/tmp/lamassembleC8MgEn/x.fa')' returned non-zero exit status 1
lastal: can't interpret: 30g
Traceback (most recent call last):
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 756, in <module>
main(opts, *args)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 701, in main
assemble(opts, scale, probMatrix, gapProbs, sequences, tmpdir)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 595, in assemble
alignments = pairwiseAlignments(opts, *tmpFileNames)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 571, in pairwiseAlignments
addStrandAlignments(alignments, opts, db, seqFile, fwdMat, 1)
File "/home/hsinlun/miniconda3/envs/poretools/bin/lamassemble", line 567, in addStrandAlignments
raise subprocess.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '('lastal', '-s1', '-j4', '-D1e9', '-P1', '-m5', '-z30g', '-p', '/tmp/lamassemble8Ocppx/fwd.mat', '/tmp/lamassemble8Ocppx/db', '/tmp/lamassemble8Ocppx/x.fa')' returned non-zero exit status 1
I think it's because you have an old version of LAST. Try lastal --version
. It should be >= 935 according to https://gitlab.com/mcfrith/lamassemble
Hi,
I greatly appreciate this amazing tool. I am currently attempting allele prediction using the
-o2
option. I am curious about how this tool predicts alleles and how to interpret the output.Here are my results:
In this example, there are two alleles with repeat of 91 and 95. I am wondering which allele the read with a repeat length of 97 will be assigned to.
Similarly, in this example, I would like to know if this tool assigns the read to a specific allele.
Thank you.