y9c / pseudoU-BIDseq

🧪 New pipeline for detecting pseudouridine modification on RNA (BID-seq, etc)
https://bidseq.chuan.science/
GNU General Public License v3.0
14 stars 4 forks source link

Detailed parameters of STAR #2

Closed xiaohe0404 closed 1 year ago

xiaohe0404 commented 1 year ago

I'm sorry to bother you. Can you provide your detailed parameters of STAR? And I'm still confused about setting parameters of barcode. Because I have already cut the UMI sequence and barcodes. Do I still need to write "barcode: '-NNNNN'" in data.yaml? Looking forward to your reply! I will really really really appreciate you!! Best wishes

y9c commented 1 year ago

Hi @xiaohe0404, do you want to start with trimmed fastq files rather than raw sequencing files? Extracting the UMI sequence in the cutadapt step is essential for the downstream analysis. I am not sure if your processed files still fit this pipeline. Could you provide more details about how you trim the fastq files?

xiaohe0404 commented 1 year ago

Thanks for your timely reply! Here are my detailed parameters:

  1. I cut 5' and 3' SR adapters by cutadapt;
  2. I cut 5'UMI(6bp)+GGG(TSO) of Read1 and added these infomation to query name by using fastp with following parameters: -A -Q -L -U --umi_loc=read1 --umi_len=6 --umi_prefix=UMI --umi_skip=3.
  3. I cut 3' barcode (6bp) of read1 by using seqkit subseq -r 1:-7. And then I used these output as clean trimmed fastq files and as the input of STAR.
y9c commented 1 year ago

fastp can trimming the adapter in your sample, but the output format (UMI_NNNNNN) is not compatible with this pipeline.

Suppose you are using the template switch with dual UMI strategy for your library construction, it is highly recommended that you can run this pipeline with barcode: NNNNNNXXX-XXXNNNNNN[^1][^2] setting. No additional settings need to be modified.

[^1]: XXX after the - symbol is for trimming mismatch tail at the 3'. For your description, you might use the random RT method, which would also create mismatches at the 3' end of the reads. [^2]: NNNNNN at the end is for extracting "3' barcode" you mentioned in step 3. If you do not need this sequence, replace NNNNNN with XXXXXX would help.

xiaohe0404 commented 1 year ago

Thanks for your reply, this is really helpful!

y9c commented 1 year ago

You are welcome. If you have any question about this pipeline, do not hesitate to raise new issues.