Open JingGuo1997 opened 1 year ago
Hi Jing,
Glad to see that you're interested in our work. The gaps in fastq sequences are removed with seqkit:
seqkit seq -m 108 -g ${cell}_full_length.fastq > ${cell}_full_length_filtered.fastq
where -m
specifies minimum read length and -g
means remove gaps letters. Check more on the Usage and Examples page of seqkit. Technically, remove gaps improves quality of ONT reads mapping. In our pipeline it's just a habitual behavior and we had not test how much this could improve the performance of the pipeline,
An extra 8-bp is removed to deal with unexpected insertions or base errors caused by Nanopore sequencing and guarantee the ploy-A trimming step to work properly, Since ploy-A length was not included in our analysis, we figure would be ok to remove a few based in the end of ploy-A. This is not mandatory and should be adjusted according to you data.
Please let me know if you have any further questions.
Hi Zhenyu, Thank you for your prompt response; your reply has been very helpful to me!
hi,scan-seq2 developer: The single-cell third-generation transcriptome sequencing that you have developed is extremely exciting. When I replicate your data I have some doubts. 1、the code,My understanding is to remove reads whose length is less than 108bp, but I don't know where remove gaps is embodied in coding and why remove gaps should be removed.
read length < 100 and remove gaps
seqkit seq -m 108 -g ${cell}_full_length.fastq > ${cell}_full_length_filtered.fastq rm -f ${cell}_full_length.fastq 2、the code,removeing extract 8bp will truncate ploy A by 8bp in reads where umi has been removed. Why do you want to do this?
remove extra 8 bp
cutadapt -u -8 -o ${cell}_full_length_filtered.fastq ${cell}_full_length_filtered.extract.fastq rm -f ${cell}_full_length_filtered.extract.fastq
Sincerely look forward to your reply!