Biology-4: RNA-editing - Githubissues

twang15 commented 3 years ago

Enzyme, ADAR (proof-read and correct mistakes in RNA)

ADAR editing enzymes are found in all multicellular animals and are conserved in sequence and protein organization. The number of ADAR genes differs between animals, ranging from three in mammals to one in Drosophila. ADAR is also alternatively spliced to generate isoforms that can differ significantly in enzymatic activity. Therefore, to study the enzyme in vitro, it is essential to have an easy and reliable method of expressing and purifying recombinant ADAR protein. To add to the complexity of RNA editing, the number of transcripts that are edited by ADARs differs in different organisms.

ADAR: an enzyme proofreading and correcting mistakes in RNA

guide RNA, and its role in RNA editing

Long-read RNA sequencing

It is possible to quantify site‐specific RNA editing by sequencing of clones derived from RT‐PCR products.

From Fereshteh:

Barcode

Well

single cell

single molecule

twang15 commented 3 years ago

UMI

Unique molecular identifiers (UMIs), or molecular barcodes (MBC) are short sequences or molecular "tags" added to DNA fragments in some next generation sequencing library preparation protocols to identify the input DNA molecule. These tags are added before PCR amplification, and can be used to reduce errors and quantitative bias introduced by the amplification.
scRNA-seq computational analysis workflow
- The first steps (yellow) are general for any highthroughput sequencing data.
- Later steps (orange) require a mix of existing RNASeq analysis methods and novel methods to address the technical difference of scRNASeq.
- Finally the biological interpretation (blue) should be analyzed with methods specifically developed for scRNASeq.

RNA-Seq_workflow-5

Experimental methods: 2 important aspects -> quantification and capture
- For quantification, there are two types, full-length and tag-based. The former tries to achieve a uniform read coverage of each transcript. By contrast, tag-based protocols only capture either the 5’- or 3’-end of each RNA.
- The strategy used for capture determines throughput, how the cells can be selected as well as what kind of additional information besides the sequencing that can be obtained.
- The three most widely used options are microwell-, microfluidic- and droplet- based.
- The bead is loaded with the enzymes required to construct the library. In particular, each bead contains a unique barcode which is attached to all of the reads originating from that cell. Thus, all of the droplets can be pooled, sequenced together and the reads can subsequently be assigned to the cell of origin based on the barcodes.

UMIs can only be used with tagged protocols and they can facilitate gene-level quantification.
- Cell barcode and UMI are at different levels.
- Unique Molecular Identifiers are short (4-10bp) random barcodes added to transcripts during reverse-transcription. They enable sequencing reads to be assigned to individual transcript molecules and thus the removal of amplification noise and biases from scRNASeq data.
- UMI in RNA-seq: RNA-Seq methods work with small starting amounts of RNA that require PCR amplification to generate sequenceable sized libraries. To assess the effects of amplification, reads that originated from the same RNA molecule (PCR-duplicates) need to be identified.
- UMI in DNA sequencing
- UMI-sequencing typically consists of paired-end reads where one read from each pair captures the cell and UMI barcodes while the other read consists of exonic sequence from the transcript
- In theory, every unique UMI-transcript pair should represent all reads originating from a single RNA molecule.

twang15 commented 3 years ago

UMI: Unique Molecular Tags (UMTs), Random Molecular Tags (RMTs), Molecular Barcode

UMIs, also known as Molecular Barcodes or Random Barcodes, consist of short random nucleotide sequences which are added to each molecule in a sample as a unique tag.
- The UMIs are introduced during library generation before the final library fragment is amplified in the PCR step
- These barcodes are copied along with the molecule in the PCR step. Downstream data analysis can then deduplicate the copies, revealing the original ratio of molecules in the sample and eliminating amplification bias.
In any case, to estimate the number of genes or transcripts expressed in a single cell, UMIs are crucial.
- The primary advantage of including UMIs in a sequencing experiment is to enable the accurate bioinformatic identification of PCR duplicates. Without this capacity, the PCR duplicates can have a detrimental impact on downstream data analysis, especially when amplification biases occurred.
- UMIs therefore ultimately act as tags that allow the accurate identification subsequent removal of PCR duplicates in sequencing data.
- UMIs may be utilized in any sequencing method, where confident identification of duplicates by alignment coordinate is not possible or where accurate quantification is required. The UMI method could be applied to count all types of molecules or particles such as viruses, proteins, and in methods like ChiP-Seq, karyotyping and others
- Variants or mutations are considered “true” when they are identical within the individual reads carrying the same UMI and between reads with different UMIs
Some Applications for UMI RNA-Seq
- UMIs for transcripts or gene quantification

UMIs for targeted Sequencing Approach
UMIs in single-cell sequencing: A typical single mammalian cell contains approximately 105 – 106 mRNA molecules and the human cell atlas determined ~11,000 detectable genes in various human cell lines. As genes can be expressed by multiple transcript isoforms differing in their transcriptional start and end sites, exon / intron composition, and expression level, the quantification of transcripts in single cells is particularly challenging.

How many different UMIs are needed?
- UMIs will reflect molecule counts only if the number of available distinct tags is substantially larger than the typical number of identical molecules. The random sequence composition of the UMIs ensures that every library fragment-UMI combination is unique.
A fundamental assumption in RNA-Seq has been that library fragments sharing a UMI sequence and read mapping locus were derived from the same initial input molecule.
The UMI-tagged NGS data allow users to
- 1) accurately quantify the expression levels of genes in different cells using single cell RNA-Seq experiments ( differential expression of transcriptome at cellular levels instead of a tissue to study cell-to-cell heterogeneity) and

2) detect low frequency variants with better sensitivity and specificity using UMI based DNA-Seq experiments

PCR duplicate

PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference genome guided alignment. However, identical molecules can be independently generated during library preparation and can have unique cellular origins. Thus, false identification of these molecules as PCR duplicates can lead to erroneous analysis and interpretation of NGS data.

NGS and Precision Medicine

Next Generation Sequencing (NGS) technologies have remarkably revolutionized the medical and genomics research. The incremental cost reductions and size of the throughput at molecular resolution helped penetration and acceptance of the NGS methodologies into worldwide labs and clinics. The third generation wave of NGS technologies are knocking the doors to provide impetus to the dream of preventive, predictive, personalized, and precision (P4) medicine initiative.

At the core of NGS technologies lies the fine tuned, optimised, sensitive molecular biology and chemistry protocols, which helps to accurately snapshot the response of cells at molecular resolution under varying genotypic conditions and environmental impacts. To enable the understanding of genotype-phenotype relationships, accurate quantification of sequenced reads plays a key role before arriving at conclusions and deriving actionable insights from the NGS data. Ultra-sensitive variant calling and transcript quantification using Unique Molecular Identifiers | StrandNGS blog.pdf

twang15 / Long-read-RNA

Biology-4: RNA-editing #8

Enzyme, ADAR (proof-read and correct mistakes in RNA)

guide RNA, and its role in RNA editing

Long-read RNA sequencing

From Fereshteh:

Barcode

Well

single cell

single molecule

UMI

UMI: Unique Molecular Tags (UMTs), Random Molecular Tags (RMTs), Molecular Barcode

PCR duplicate

NGS and Precision Medicine