t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
37 stars 22 forks source link

strand-specific SLAM-seq analysis #143

Closed ZiggeyQi closed 6 months ago

ZiggeyQi commented 6 months ago

Hi there, if i wanna use 4sU pulsed total RNA to construct strand-specific library followed by the SLAM-seq protocol with the exception of library construction step, in this context, rRNA will be removed but no polyA enrichment steps for library construction. With this strategy, can slamdunk helpful to identify the new synthetized RNA for further analysis without the 3'-UTR annotation information provided. I expected to get the strand informations of both nascent and steady RNAs, is there any advices for me to achieve my goal. Your reply is highly anticipated.

Best, Ziggey

t-neumann commented 6 months ago

Hi Ziggey - so you are not using any 3'end protocols but you do full -length transcript sequencing?

ZiggeyQi commented 6 months ago

Yes, i plan to use the SLAM-seq sample to construct the standarded strand-specific library to get the transcribed RNA strand info. There is no 3'-polyA enrichment, just following the instruction of commercial strand-specific library construction kit, the typical strand-specific library construction main steps including: rRNA removement, fragmentation, reverse transcription, NGS-sequencing(PE150, not full length). All the things were same as the SLAM-seq protocol, just replace the “QuantSeq 3′mRNA-Seq Library Prep Kit” as “KAPA Stranded RNA-Seq Library Preparation Kit” to get the strand-specific library. Most RNA types and immature mRNAs will unbiasedly contained in the strand-specific library, how can i use slamdunk help me to get the T>C conversion and unlabeled reads under this context.

isaacvock commented 6 months ago

The SLAMDUNK developer (who initially responded to your question) will likely have more to say, but I will offer these two comments:

  1. Judging by other Issues posted on this repo, there is precedence for aligning reads from a total RNA library prep with a splice-aware aligner of your choice (I would suggest HISAT-3N, which is specifically mentioned in #104 and #99) and then using the other SLAMDUNK modules to filter, call SNPs, and count mutations in your resulting bam files. I have even seen one published example where the authors went ahead and used SLAMDUNK's map module for total RNA data; this doesn't seem ideal in your case as you mentioned wanting to disambiguate nascent and mature RNA, which is really only possible with a splice-aware aligner. To be clear, SLAMDUNK uses NextGenMap for alignment, which is not splice aware. Thus, for total RNA-seq data, its best to align to a transcriptome, which requires removing intronic regions from the fasta file you align to. In any case it might take some maneuvering to get SLAMDUNK to provide the output you need, but it seems possible, and @t-neumann can speak more on the details of how to do that.
  2. Our lab has been doing stranded total RNA "SLAM-seq" (we use a different nucleotide recoding chemistry and call it TimeLapse-seq) and thus have a had a pipeline for processing this kind of data since the inception of the method. I have put a lot of work into making this pipeline easier to use via Snakemake + lots of documentation. The pipeline is now called bam2bakR, and provides as output a table with (among other things) columns labeled GF and XF. GF represents the gene that a given read overlapped, and XF represents the same thing, but only for reads aligning to exclusively exonic regions. If a read overlaps any intronic region, the value for XF will be NA. This allows you to easily disambiguate reads from pre-RNA and reads (most likely) originating from mature RNA. I also developed an R package called bakR to facilitate analyzing this data, and in particular performing comparative analyses to identify differences in synthesis and degradation rate constants between distinct experimental conditions. I am currently working on revamped versions of bam2bakR and bakR, designed to be more flexible and support a wider array of neat and useful analyses, which I hope to beta release by late March.

Best, Isaac

ZiggeyQi commented 6 months ago

Hi, Isaac, thanks a lot for your helpful comments. So i can use HISAT-3N to align my trimmed reads to reference genome, then using the BAM file as the input of SLAMDUNK to do the next steps, just minding the tags generated by HISAT-3N matchs to the NextGenMap's format, also, i can just use the pipeline bam2bakR to get my expected outputs.

t-neumann commented 6 months ago

Hi @ZiggeyQi and @isaacvock

first of all thanks @isaacvock for the comprehensive answer, I second all what you are saying.

As a third option, I could also refer you directly to the Ameres lab that developed SLAMseq, because afaik they have also a full-length SLAMseq pipeline set up (I vaguely recall STAR + custom quantification library), I could double check with them and put you in contact (or add them here) if you are interested @ZiggeyQi ?

Best,

Tobi

ZiggeyQi commented 6 months ago

Hi, Tobi, thank you for providing me the opportunity to contact Ameres lab, now the standarded SLAMseq with fine-tuning the library construction and analysis steps already mets my research goals. Thanks again for your helpful reply @t-neumann , @isaacvock

t-neumann commented 6 months ago

Hi @ZiggeyQi - in case you want to reach out to the contact - Niko Popitsch - I tagged him here so you can follow up @popitsch

ZiggeyQi commented 6 months ago

OK, thanks a lot.