Open starsyi opened 11 months ago
I believe in obtaining the accurate insertion sequence, which are crucial for genome assembly, sequence alignment, and the study of sequence features (especially cfDNA, scRNA, etc.). I hope to have better optimization and solution approaches.
Hi @starsyi - thank you for the feedback and for the detailed analysis. I completely agree that we should improve the accuracy of our trimming. We have an ongoing effort to add adapter trimming as well, so we'll investigate ways to enhance the accuracy of the trim positions.
I don't know if the author is aware of the issue with incomplete removal of sequence adaptors and barcodes. Specifically, when using tools such as Dorado Demux, Porechop, and Guppy_barcoder to trim adaptors and barcodes, they are unable to completely remove them. There are often residual sequences of 1-15bp remaining at the 5' end, and similar situations occur with residual sequences at the 3' end. Is it possible to optimize and solve this issue? eq: Sequencing raw data:
Among them, the sequences of adaptor, barcode, and barcode on both sides are separated by
^^^
.trimmed sequence:
The trimmed sequence contains a portion of the barcode flank sequence
GCACCT
. And one base 'T' was removed at the 3' end.The actual insertion sequence should be as follows: