twang15 / Long-read-RNA

0 stars 0 forks source link

Biology-4: RNA-editing #8

Closed twang15 closed 2 years ago

twang15 commented 2 years ago

Enzyme, ADAR (proof-read and correct mistakes in RNA)

ADAR editing enzymes are found in all multicellular animals and are conserved in sequence and protein organization. The number of ADAR genes differs between animals, ranging from three in mammals to one in Drosophila. ADAR is also alternatively spliced to generate isoforms that can differ significantly in enzymatic activity. Therefore, to study the enzyme in vitro, it is essential to have an easy and reliable method of expressing and purifying recombinant ADAR protein. To add to the complexity of RNA editing, the number of transcripts that are edited by ADARs differs in different organisms.

  1. ADAR: an enzyme proofreading and correcting mistakes in RNA

guide RNA, and its role in RNA editing

Long-read RNA sequencing

It is possible to quantify site‐specific RNA editing by sequencing of clones derived from RT‐PCR products.

From Fereshteh:

  1. Linked Reads Genomics - 10X Genomics
  2. Unique Molecule Identifier
  3. UMI-count modeling and differential expression analysis for single-cell RNA sequencing
  4. UMI reveal a novel sequencing artefact with implications for RNA-seq based gene expression analysis

Barcode

Well

single cell

single molecule

twang15 commented 2 years ago

UMI

  1. Unique molecular identifiers (UMIs), or molecular barcodes (MBC) are short sequences or molecular "tags" added to DNA fragments in some next generation sequencing library preparation protocols to identify the input DNA molecule. These tags are added before PCR amplification, and can be used to reduce errors and quantitative bias introduced by the amplification.
  2. scRNA-seq computational analysis workflow
    • The first steps (yellow) are general for any highthroughput sequencing data.
    • Later steps (orange) require a mix of existing RNASeq analysis methods and novel methods to address the technical difference of scRNASeq.
    • Finally the biological interpretation (blue) should be analyzed with methods specifically developed for scRNASeq.

RNA-Seq_workflow-5

  1. Experimental methods: 2 important aspects -> quantification and capture
    • For quantification, there are two types, full-length and tag-based. The former tries to achieve a uniform read coverage of each transcript. By contrast, tag-based protocols only capture either the 5’- or 3’-end of each RNA.
    • The strategy used for capture determines throughput, how the cells can be selected as well as what kind of additional information besides the sequencing that can be obtained.
    • The three most widely used options are microwell-, microfluidic- and droplet- based.
    • The bead is loaded with the enzymes required to construct the library. In particular, each bead contains a unique barcode which is attached to all of the reads originating from that cell. Thus, all of the droplets can be pooled, sequenced together and the reads can subsequently be assigned to the cell of origin based on the barcodes.
  1. UMIs can only be used with tagged protocols and they can facilitate gene-level quantification.
twang15 commented 2 years ago

UMI: Unique Molecular Tags (UMTs), Random Molecular Tags (RMTs), Molecular Barcode

  1. UMIs, also known as Molecular Barcodes or Random Barcodes, consist of short random nucleotide sequences which are added to each molecule in a sample as a unique tag.
    • The UMIs are introduced during library generation before the final library fragment is amplified in the PCR step
    • These barcodes are copied along with the molecule in the PCR step. Downstream data analysis can then deduplicate the copies, revealing the original ratio of molecules in the sample and eliminating amplification bias. Lexogen_RNA-LEXICON_Chapter8_Graph-01
  2. In any case, to estimate the number of genes or transcripts expressed in a single cell, UMIs are crucial.
    • The primary advantage of including UMIs in a sequencing experiment is to enable the accurate bioinformatic identification of PCR duplicates. Without this capacity, the PCR duplicates can have a detrimental impact on downstream data analysis, especially when amplification biases occurred.
    • UMIs therefore ultimately act as tags that allow the accurate identification subsequent removal of PCR duplicates in sequencing data.
    • UMIs may be utilized in any sequencing method, where confident identification of duplicates by alignment coordinate is not possible or where accurate quantification is required. The UMI method could be applied to count all types of molecules or particles such as viruses, proteins, and in methods like ChiP-Seq, karyotyping and others
    • Variants or mutations are considered “true” when they are identical within the individual reads carrying the same UMI and between reads with different UMIs Lexogen_RNA-LEXICON_Chapter8_Graph_03
  3. Some Applications for UMI RNA-Seq
    • UMIs for transcripts or gene quantification
  1. How many different UMIs are needed?

    • UMIs will reflect molecule counts only if the number of available distinct tags is substantially larger than the typical number of identical molecules. The random sequence composition of the UMIs ensures that every library fragment-UMI combination is unique.
  2. A fundamental assumption in RNA-Seq has been that library fragments sharing a UMI sequence and read mapping locus were derived from the same initial input molecule.

  3. The UMI-tagged NGS data allow users to

    • 1) accurately quantify the expression levels of genes in different cells using single cell RNA-Seq experiments ( differential expression of transcriptome at cellular levels instead of a tissue to study cell-to-cell heterogeneity) and PCR_duplicate_removal UMIs3

PCR duplicate

PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference genome guided alignment. However, identical molecules can be independently generated during library preparation and can have unique cellular origins. Thus, false identification of these molecules as PCR duplicates can lead to erroneous analysis and interpretation of NGS data.

NGS and Precision Medicine

Next Generation Sequencing (NGS) technologies have remarkably revolutionized the medical and genomics research. The incremental cost reductions and size of the throughput at molecular resolution helped penetration and acceptance of the NGS methodologies into worldwide labs and clinics. The third generation wave of NGS technologies are knocking the doors to provide impetus to the dream of preventive, predictive, personalized, and precision (P4) medicine initiative.

At the core of NGS technologies lies the fine tuned, optimised, sensitive molecular biology and chemistry protocols, which helps to accurately snapshot the response of cells at molecular resolution under varying genotypic conditions and environmental impacts. To enable the understanding of genotype-phenotype relationships, accurate quantification of sequenced reads plays a key role before arriving at conclusions and deriving actionable insights from the NGS data. Ultra-sensitive variant calling and transcript quantification using Unique Molecular Identifiers | StrandNGS blog.pdf