liuzhenyu-yyy / SCAN-seq2

Code for "High-throughput and high-sensitivity full-length single-cell RNA-seq analysis on third-generation sequencing platform" paper. Pipeline for SCAN-seq2 data processing.
https://www.nature.com/articles/s41421-022-00500-4
MIT License
7 stars 0 forks source link

remove gaps and extract 8bp #3

Open JingGuo1997 opened 1 year ago

JingGuo1997 commented 1 year ago

hi,scan-seq2 developer: The single-cell third-generation transcriptome sequencing that you have developed is extremely exciting. When I replicate your data I have some doubts. 1、the code,My understanding is to remove reads whose length is less than 108bp, but I don't know where remove gaps is embodied in coding and why remove gaps should be removed.

read length < 100 and remove gaps

seqkit seq -m 108 -g ${cell}_full_length.fastq > ${cell}_full_length_filtered.fastq rm -f ${cell}_full_length.fastq 2、the code,removeing extract 8bp will truncate ploy A by 8bp in reads where umi has been removed. Why do you want to do this?

remove extra 8 bp

cutadapt -u -8 -o ${cell}_full_length_filtered.fastq ${cell}_full_length_filtered.extract.fastq rm -f ${cell}_full_length_filtered.extract.fastq

Sincerely look forward to your reply!

liuzhenyu-yyy commented 1 year ago

Hi Jing,

Glad to see that you're interested in our work. The gaps in fastq sequences are removed with seqkit:

seqkit seq -m 108 -g ${cell}_full_length.fastq > ${cell}_full_length_filtered.fastq

where -m specifies minimum read length and -g means remove gaps letters. Check more on the Usage and Examples page of seqkit. Technically, remove gaps improves quality of ONT reads mapping. In our pipeline it's just a habitual behavior and we had not test how much this could improve the performance of the pipeline,

An extra 8-bp is removed to deal with unexpected insertions or base errors caused by Nanopore sequencing and guarantee the ploy-A trimming step to work properly, Since ploy-A length was not included in our analysis, we figure would be ok to remove a few based in the end of ploy-A. This is not mandatory and should be adjusted according to you data.

Please let me know if you have any further questions.

JingGuo1997 commented 1 year ago

Hi Zhenyu, Thank you for your prompt response; your reply has been very helpful to me!