Biology-3 - Githubissues

twang15 commented 3 years ago

In addition to play with the source code, I also asked the Samtools community for help:

https://github.com/samtools/samtools/issues/1406 https://github.com/samtools/samtools/issues/1407 https://github.com/samtools/samtools/issues/1409

twang15 commented 3 years ago

pileup format

http://samtools.sourceforge.net/pileup.shtml
https://en.wikipedia.org/wiki/Pileup_format
- </> (less-/greater-than sign) denotes a reference skip. This occurs, for example, if a base in the reference genome is intronic and a read maps to two flanking exons. If quality scores are given in a sixth column, they refer to the quality of the read and not the specific base.
- ^ (caret) marks the start of a read segment and the ASCII of the character following `^' minus 33 gives the mapping quality

twang15 commented 3 years ago

Questions

What other features are we looking for?
- to decide the programming language
- work load
What are +/- strand in our output file?
How to decide which strand (+/-) a RNA sequence should be mapped to?

twang15 commented 3 years ago

If we have more than 20%-50% of read base is ^ (beginning) or $ (end of a read), we could discard the read completely.
Given a mRNA, we have the information that whether it is on the forward or reverse strand. We can incorporate this information into our analysis if necessary.
More Features
- ignore empty lines
- parse as defined in requirement

twang15 commented 3 years ago

Pileup format

caret(hat, ^): If this is the first position covered by the read, a “^” character followed by the alignment's mapping quality encoded as an ASCII character.
$: If this is the last position covered by the read, a “$” character.
: Deleted bases are shown as “” on both strands unless --reverse-del is used, in which case they are shown as “#” on the reverse strand.

Forward	Reverse	Meaning
. dot	, comma	Base matches the reference base
ACGTN	acgtn	Base is a mismatch to the reference base
>	\<	Reference skip (due to CIGAR “N”)
*	*\/#	Deletion of the reference base (CIGAR “D”)

Reference:

twang15 commented 2 years ago

MicroRNAs (miRNAs)

are a class of non-coding RNAs that play important roles in regulating gene expression. The majority of miRNAs are transcribed from DNA sequences into primary miRNAs and processed into precursor miRNAs, and finally mature miRNAs. In most cases, miRNAs interact with the 3′ untranslated region (3′ UTR) of target mRNAs to induce mRNA degradation and translational repression. However, interaction of miRNAs with other regions, including the 5′ UTR, coding sequence, and gene promoters, have also been reported. Under certain conditions, miRNAs can also activate translation or regulate transcription. The interaction of miRNAs with their target genes is dynamic and dependent on many factors, such as subcellular location of miRNAs, the abundancy of miRNAs and target mRNAs, and the affinity of miRNA-mRNA interactions. miRNAs can be secreted into extracellular fluids and transported to target cells via vesicles, such as exosomes, or by binding to proteins, including Argonautes. Extracellular miRNAs function as chemical messengers to mediate cell-cell communication. In this review, we provide an update on canonical and non-canonical miRNA biogenesis pathways and various mechanisms underlying miRNA-mediated gene regulations. We also summarize the current knowledge of the dynamics of miRNA action and of the secretion, transfer, and uptake of extracellular miRNAs.

twang15 commented 2 years ago

RNA editing (SNP, either C - > U or A - > G

RNA editing is a process through which the nucleotide sequence specified in the genomic template is modified to produce a different nucleotide sequence in the transcript. RNA editing is an important mechanism of genetic regulation that amplifies genetic plasticity by allowing the production of alternative protein products from a single gene. There are two generic classes of RNA editing in nuclei, involving enzymatic deamination of either C-to-U or A-to-I nucleotides. The best characterized example of C-to-U RNA editing is that of apolipoprotein B (apoB), which is mediated by a holoenzyme that contains a minimal core composed of an RNA-specific cytidine deaminase apobec-1, and its cofactor apobec-1 complementation factor (ACF). C-to-U editing of apoB RNA generates two different isoforms—apoB100 and apoB48—from a single transcript. Both are important regulators of lipid transport and metabolism, and are functionally distinct. C-to-U apoB RNA editing is regulated by a range of factors including developmental, nutritional, environmental, and metabolic stimuli.

A -> I is A->G on + DNA strand in Long-read RNA sequencing
C->U is C->T on - DNA strand in Long-read RNA sequencing

RNA editing changes (by replacing NH3 to H2O, called deamination, see following figures) CAA to UAA (a stop codon), so that only part of the gene is translated into protein (partial expression). The deamination event is specific to intestine (site-specific) and as a result a truncated protein is formed in the intestine.

Screen Shot 2021-10-20 at 9.25.11 AM.pdf

Screen Shot 2021-10-20 at 9.13.01 AM.pdf

Screen Shot 2021-10-20 at 9.31.47 AM.pdf

Screen Shot 2021-10-20 at 9.33.48 AM.pdf

RNA editing (insertion of U)

Screen Shot 2021-10-20 at 9.37.51 AM.pdf

twang15 commented 2 years ago

RT-PCR (REVERSE TRANSCRIPTION–POLYMERASE CHAIN REACTION) is used to amplify RNA targets.

RT-PCR uses RNA as starting material for in vitro nucleic acid amplification. The discovery of retroviral reverse transcriptase in the early 1970s ultimately made RT-PCR possible. Reverse transcriptase is an RNA-dependent DNA polymerase, catalyzing DNA synthesis using RNA as the template. The end product is known as complementary DNA (cDNA). cDNA is not subject to RNase degradation, making it more stable than RNA. In RT-PCR, the starting RNA is subsequently degraded, dsDNA is produced, and PCR amplification proceeds in the usual manner. RNA extraction kits for both manual and automated RNA purification exist and, when combined with RT-PCR, make RNA analysis in the clinical laboratory virtually as rapid and equally sensitive as PCR-based DNA amplification.
Reverse transcription (RT)-PCR is used to amplify RNA targets. The RNA template is converted into complementary (c)DNA by the enzyme reverse transcriptase. The cDNA serves later as a template for exponential amplification using PCR. RT-PCR can be undertaken in one or two steps. One-step RT-PCR combines the RT reaction and PCR reaction in the same tube. Only sequence-specific primers may be used. During two-step RT-PCR, the synthesized cDNA is transferred into a second tube for PCR.
RT-PCR is commonly used in the diagnosis and quantification of RNA virus infections
Gene expression profiling is likely to have a major impact on molecular diagnostics in the coming years and will depend on RNA analysis using RT-PCR and possibly high-density arrays.

PCR

Sometimes called "molecular photocopying," the polymerase chain reaction (PCR) is a fast and inexpensive technique used to "amplify" - copy - small segments of DNA. Because significant amounts of a sample of DNA are necessary for molecular and genetic analyses, studies of isolated pieces of DNA are nearly impossible without PCR amplification.

Taq polymerase: Like DNA replication in an organism, PCR requires a DNA polymerase enzyme that makes new strands of DNA, using existing strands as templates. The DNA polymerase typically used in PCR is called Taq polymerase, after the heat-tolerant bacterium from which it was isolated (Thermus aquaticus).

PCR Primer

Like other DNA polymerases, Taq polymerase can only make DNA if it's given a primer, a short sequence of nucleotides that provides a starting point for DNA synthesis. In a PCR reaction, the experimenter determines the region of DNA that will be copied, or amplified, by the primers she or he chooses. PCR primers are short pieces of single-stranded DNA, usually around 20 nucleotides in length.
Two primers are used in each PCR reaction, and they are designed so that they flank the target region (region that should be copied). That is, they are given sequences that will make them bind to opposite strands of the template DNA, just at the edges of the region to be copied. The primers bind to the template by complementary base pairing.

pcr

Both primers, when bound, point “inward” – that is, in the 5’ to 3’ direction towards the region to be copied. Like other DNA polymerases, Taq polymerase can only synthesize DNA in the 5’ to 3’ direction. When the primers are extended, the region that lies between them will thus be copied.

Basic steps:

Denaturation (96°C): Heat the reaction strongly to separate, or denature, the DNA strands. This provides single-stranded template for the next step.
Annealing ((55-65°C): Cool the reaction so the primers can bind to their complementary sequences on the single-stranded template DNA.
Extensions (72°C): Raise the reaction temperatures so Taq polymerase extends the primers, synthesizing new strands of DNA.

This cycle repeats 25-35 times in a typical PCR reaction, which generally takes 2-4 hours, depending on the length of the DNA region being copied. If the reaction is efficient (works well), the target region can go from just one or a few copies to billions. There are many copies of the primers and many molecules of Taq polymerase floating around in the reaction, so the number of DNA molecules can roughly double in each round of cycling. This pattern of exponential growth is shown in the image below. pcr-3 pcr-4

twang15 commented 2 years ago

Using gel electrophoresis to visualize the results of PCR

The results of a PCR reaction are usually visualized (made visible) using gel electrophoresis. Gel electrophoresis is a technique in which fragments of DNA are pulled through a gel matrix by an electric current, and it separates DNA fragments according to size. A standard, or DNA ladder, is typically included so that the size of the fragments in the PCR sample can be determined.
DNA fragments of the same length form a "band" on the gel, which can be seen by eye if the gel is stained with a DNA-binding dye. For example, a PCR reaction producing a 400 base pair (bp) fragment would look like this on a gel:
A DNA band contains many, many copies of the target DNA region, not just one or a few copies. Because DNA is microscopic, lots of copies of it must be present before we can see it by eye. This is a big part of why PCR is an important tool: it produces enough copies of a DNA sequence that we can see or manipulate that region of DNA.

Plamsmid

A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria; however, plasmids are sometimes present in archaea and eukaryotic organisms.
质粒(plasmid) 广泛存在于生物界，从细菌、放线菌、丝状真菌、大型真菌、酵母到植物，甚至人类机体中都含有。从分子组成看，有DNA 质粒，也有RNA 质粒; 从分子构型看，有线型质粒、也有环状质粒: 其表型也多种多样。细菌质粒是基因工程中最常用的载体。

twang15 / Long-read-RNA

Biology-3 #6

pileup format

Questions

Pileup format

MicroRNAs (miRNAs)

RNA editing (SNP, either C - > U or A - > G

RNA editing (insertion of U)

RT-PCR (REVERSE TRANSCRIPTION–POLYMERASE CHAIN REACTION) is used to amplify RNA targets.

PCR

PCR Primer

Using gel electrophoresis to visualize the results of PCR

Plamsmid