Open mlosilla opened 3 years ago
Right, PCR could produce all combinations of the forward and backward primers. We should output them too.
yeah exactly.
In my case, the positions in the bed file would be enough. I would choose the amplicon I want and use the positions as trimming points for my fastq reads--but the sequences of the PCR products may be useful for other users.
Thank you for considering it!
In my case, the positions in the bed file would be enough.
Oh, you can use seqkit loate
first, which outputs BED format.
期待这个功能早日上线
Hi, I have the same issue here. Outputting the largest amplicon only was misleading in my case, as I was looking for an amplicon within repititive RNA genes of an eukaryotic genome. Instead of predicting a correct PCR amplicon of size around 400bp, seqkit amplicon
produced an amplicon of ~50 Kb which does not make any sense in my case.
As a solution, I see that outputting all valid amplicons is the best option, or you may add an option expected_amplicon_size
to give this length a priority or to make an upper limit for the amplicon prediction.
Here is a toy example
echo -ne ">seq\nAGTACCTTGGTAGGAGTTTCCTGCTAATGATAAGAATGATATTGGACTAAGTAATGTTGCAAATATAGAAACTGAT\n" | seqkit amplicon -F GGTAGG -R ATCAG
echo "AGTACCTTGGTAGGAGTTTCCTGCTAATGATAAGAATGATATTGGACTAAGTAATGTTGCAAATATAGAAACTGAT" > seq
echo -ne ">seq\n" > multi_seq.fa
for seq in {1..10}; do cat seq >> multi_seq.fa; done
cat multi_seq.fa | seqkit amplicon -F GGTAGG -R ATCAG | seqkit stats
Hi,
In version 0.15.0 seqkit amplicon only keeps one amplicon (the largest) per primer pair. I don't always care about the largest amplicon.
Would it be possible to implement the feature of keeping all valid amplicons? A bed file with the start and end bases (columns 2 and 3) of all valid pairs would be fantastic. That would permit me to inspect the amplicons and keep the one I want (in my case, the smallest amplicon over a certain threshold).
Thanks