twang15 / Long-read-RNA

0 stars 0 forks source link

Biology-3 #6

Closed twang15 closed 2 years ago

twang15 commented 3 years ago

In addition to play with the source code, I also asked the Samtools community for help:

https://github.com/samtools/samtools/issues/1406 https://github.com/samtools/samtools/issues/1407 https://github.com/samtools/samtools/issues/1409

twang15 commented 3 years ago

pileup format

  1. http://samtools.sourceforge.net/pileup.shtml
  2. https://en.wikipedia.org/wiki/Pileup_format
    • </> (less-/greater-than sign) denotes a reference skip. This occurs, for example, if a base in the reference genome is intronic and a read maps to two flanking exons. If quality scores are given in a sixth column, they refer to the quality of the read and not the specific base.
    • ^ (caret) marks the start of a read segment and the ASCII of the character following `^' minus 33 gives the mapping quality
twang15 commented 3 years ago

Questions

  1. What other features are we looking for?
    • to decide the programming language
    • work load
  2. What are +/- strand in our output file?
  3. How to decide which strand (+/-) a RNA sequence should be mapped to?
twang15 commented 3 years ago
  1. If we have more than 20%-50% of read base is ^ (beginning) or $ (end of a read), we could discard the read completely.
  2. Given a mRNA, we have the information that whether it is on the forward or reverse strand. We can incorporate this information into our analysis if necessary.
  3. More Features
    • ignore empty lines
    • parse as defined in requirement
twang15 commented 3 years ago

Pileup format

  1. caret(hat, ^): If this is the first position covered by the read, a “^” character followed by the alignment's mapping quality encoded as an ASCII character.
  2. $: If this is the last position covered by the read, a “$” character.
  3. : Deleted bases are shown as “” on both strands unless --reverse-del is used, in which case they are shown as “#” on the reverse strand.
Forward Reverse Meaning
. dot , comma Base matches the reference base
ACGTN acgtn Base is a mismatch to the reference base
> \< Reference skip (due to CIGAR “N”)
* *\/# Deletion of the reference base (CIGAR “D”)
  1. Reference:
twang15 commented 2 years ago

MicroRNAs (miRNAs)

are a class of non-coding RNAs that play important roles in regulating gene expression. The majority of miRNAs are transcribed from DNA sequences into primary miRNAs and processed into precursor miRNAs, and finally mature miRNAs. In most cases, miRNAs interact with the 3′ untranslated region (3′ UTR) of target mRNAs to induce mRNA degradation and translational repression. However, interaction of miRNAs with other regions, including the 5′ UTR, coding sequence, and gene promoters, have also been reported. Under certain conditions, miRNAs can also activate translation or regulate transcription. The interaction of miRNAs with their target genes is dynamic and dependent on many factors, such as subcellular location of miRNAs, the abundancy of miRNAs and target mRNAs, and the affinity of miRNA-mRNA interactions. miRNAs can be secreted into extracellular fluids and transported to target cells via vesicles, such as exosomes, or by binding to proteins, including Argonautes. Extracellular miRNAs function as chemical messengers to mediate cell-cell communication. In this review, we provide an update on canonical and non-canonical miRNA biogenesis pathways and various mechanisms underlying miRNA-mediated gene regulations. We also summarize the current knowledge of the dynamics of miRNA action and of the secretion, transfer, and uptake of extracellular miRNAs.

twang15 commented 2 years ago

RNA editing (SNP, either C - > U or A - > G

RNA editing is a process through which the nucleotide sequence specified in the genomic template is modified to produce a different nucleotide sequence in the transcript. RNA editing is an important mechanism of genetic regulation that amplifies genetic plasticity by allowing the production of alternative protein products from a single gene. There are two generic classes of RNA editing in nuclei, involving enzymatic deamination of either C-to-U or A-to-I nucleotides. The best characterized example of C-to-U RNA editing is that of apolipoprotein B (apoB), which is mediated by a holoenzyme that contains a minimal core composed of an RNA-specific cytidine deaminase apobec-1, and its cofactor apobec-1 complementation factor (ACF). C-to-U editing of apoB RNA generates two different isoforms—apoB100 and apoB48—from a single transcript. Both are important regulators of lipid transport and metabolism, and are functionally distinct. C-to-U apoB RNA editing is regulated by a range of factors including developmental, nutritional, environmental, and metabolic stimuli.

RNA editing changes (by replacing NH3 to H2O, called deamination, see following figures) CAA to UAA (a stop codon), so that only part of the gene is translated into protein (partial expression). The deamination event is specific to intestine (site-specific) and as a result a truncated protein is formed in the intestine.

Screen Shot 2021-10-20 at 9.25.11 AM.pdf

Screen Shot 2021-10-20 at 9.13.01 AM.pdf

Screen Shot 2021-10-20 at 9.31.47 AM.pdf

Screen Shot 2021-10-20 at 9.33.48 AM.pdf

RNA editing (insertion of U)

Screen Shot 2021-10-20 at 9.37.51 AM.pdf

twang15 commented 2 years ago

RT-PCR (REVERSE TRANSCRIPTION–POLYMERASE CHAIN REACTION) is used to amplify RNA targets.

PCR

Sometimes called "molecular photocopying," the polymerase chain reaction (PCR) is a fast and inexpensive technique used to "amplify" - copy - small segments of DNA. Because significant amounts of a sample of DNA are necessary for molecular and genetic analyses, studies of isolated pieces of DNA are nearly impossible without PCR amplification.

PCR Primer

pcr

Basic steps:

  1. Denaturation (96°C): Heat the reaction strongly to separate, or denature, the DNA strands. This provides single-stranded template for the next step.
  2. Annealing ((55-65°C): Cool the reaction so the primers can bind to their complementary sequences on the single-stranded template DNA.
  3. Extensions (72°C): Raise the reaction temperatures so Taq polymerase extends the primers, synthesizing new strands of DNA.

This cycle repeats 25-35 times in a typical PCR reaction, which generally takes 2-4 hours, depending on the length of the DNA region being copied. If the reaction is efficient (works well), the target region can go from just one or a few copies to billions. There are many copies of the primers and many molecules of Taq polymerase floating around in the reaction, so the number of DNA molecules can roughly double in each round of cycling. This pattern of exponential growth is shown in the image below. pcr-3 pcr-4

twang15 commented 2 years ago

Using gel electrophoresis to visualize the results of PCR

Plamsmid