twang15 / K562-Analysis

1 stars 1 forks source link

Biology-3 #13

Open twang15 opened 3 years ago

twang15 commented 3 years ago

Transcription factors

  1. Transcription factor (proteins) control the expression of genes by regulating the process of gene transcription.

gene-structure.pdf gene-structure-2.pdf

  1. A single transcription factors can regulate multiple genes by binding to different sequences and locations of the genome.
  2. Motif diagram (Sequence logo)
    • it show only the top strand of the double-stranded DNA molecule. In fact, the TF binds to both strands.
    • Most frequent ones are on the top.
    • Most important ones (base-pair does not vary) are assigned a bigger height. If any of the essential bases varies, the transcription factor will not bind.
    • transcription factors don't usually bind to every occurrence of their binding site sequence found in the genome, other factors such as interactions with other proteins or the accessibility of a stretch of DNA can also influence whether or not a transcription factor will bind.
    • different TFs bind to different sequences and different locations in the genome and therefore regulate different sets of genes. It is the particular combination of transcription factors present in each cell controls which genes in our genome are being expressed in each of the many different types of cells that make up an individual.
  1. Consensus sequence: for a particular transcription factor, a consensus sequence is a single sequence that each position shows the base that's most commonly found at that position in all the known binding sites being analyzed.
twang15 commented 3 years ago

Crispr

  1. there are 1500 transcription factors in human cells. They interact to create a complex language of gene expression.
  2. ~200 human cell types
  3. ~20k genes
  4. ~3B base pairs

RNA editing

  1. single-pair replacement: A->U
  2. addition of U
twang15 commented 3 years ago

Transcription factors

  1. GTF (General Transcription factor)
  2. Specialized Transcription factors
    • binds to regulatory regions (enhancer)
    • recruit activator or repressor

specialized-TF.pdf

twang15 commented 3 years ago

Enhancer/cis-regulatory element / cis regulatory element

A noncoding DNA sequence in or near a gene required for proper spatiotemporal expression of that gene, often containing binding sites for transcription factors. Often used interchangeably with enhancer.

Promoter (or promoter sequence)

Promoter sequences are DNA sequences that define where transcription of a gene by RNA polymerase begins. Promoter sequences are typically located directly upstream or at the 5' end of the transcription initiation site. RNA polymerase and the necessary transcription factors bind to the promoter sequence and initiate transcription. Promoter sequences define the direction of transcription and indicate which DNA strand will be transcribed; this strand is known as the sense strand.

Many eukaryotic genes have a conserved promoter sequence called the TATA box, located 25 to 35 base pairs upstream of the transcription start site. Transcription factors bind to the TATA box and initiate the formation of the RNA polymerase transcription complex, which promotes transcription.

twang15 commented 3 years ago

Immunoglobulins

  1. Immunoglobulins play a key role in the body's immune system. They are proteins produced by specific immune cells called plasma cells in response to bacteria, viruses, and other microorganisms as well as exposures to other substances that are recognized by the body as "non-self" harmful antigens
  2. Immunoglobulin M (IgM) – IgM antibodies are produced as a body's first response to a new infection or to a new "non-self" antigen, providing short-term protection. They increase for several weeks and then decline as IgG production begins.
  3. IgG: About 70-80% of the immunoglobulins in the blood are IgG.
    • IgG antibodies form the basis of long-term protection against microorganisms. In those with a normal immune system, sufficient IgG is produced to prevent re-infection.
    • Vaccinations use this process to prevent initial infections and add to the catalog of IgG antibodies, by exposing a person to a weakened, live microorganism or to an antigen that stimulates recognition of the microorganism.
  4. Immunoglobulin A (IgA) – IgA comprises about 15% of the total immunoglobulins in the blood but is also found in saliva, tears, respiratory and gastric secretions, and breast milk.
    • Significant amounts of IgA are not produced by a baby until after 6 months of age so any IgA present in a baby's blood before then is from the mother's milk.
  5. Immunoglobulin E (IgE) – IgE is associated with allergies, allergic diseases, and with parasitic infections.
    • typically is not included as part of a quantitative immunoglobulins test.
  6. Immunoglobulin D (IgD) – the role of IgD is not completely understood and IgD is not routinely measured.
twang15 commented 3 years ago

UCSC: Bigwig file format

  1. The bigWig format is useful for dense, continuous data that will be displayed in the Genome Browser as a graph.
    • BigWig files are created from wiggle (wig) type files using the program wigToBigWig.
  2. The bigWig files are in an indexed binary format.
    • The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server.
  3. Wiggle data must be continuous and consist of equally sized elements.
    • If your data is sparse or contains elements of varying sizes, use the bedGraph format instead of the wiggle format.
    • If you have a very large bedGraph data set, you can convert it to the bigWig format using the bedGraphToBigWig program.
twang15 commented 3 years ago

bedGraph

  1. The bedGraph format allows display of continuous-valued data in track format.
    • The bedGraph format is line-oriented.
    • This track type is similar to the wiggle (WIG) format, but unlike the wiggle format, data exported in the bedGraph format are preserved in their original state.
  2. If you have a very large data set and you would like to keep it on your own server, you should use the bigWig data format.

Wig format

  1. Unlike bigWig binar files, wiggle ASCII text files can be uploaded as custom tracks onto our server.
  2. Wiggle format is line-oriented
  3. The bedGraph format is a very similar format for sparse data or data that contains elements of varying size. bedGraph can also be converted to compressed/indexed binary bigWig files.
  4. For custom tracks, use the bedGraph format if it is important to retain exact data when exporting. However, the size of all custom tracks is limited. For these reasons, we recommend always converting wiggle files to the bigWig storage format
  5. Wiggle format is composed of declaration lines and data lines, and require a separate wiggle track definition line.
twang15 commented 3 years ago

5' positions (sense tags)

  1. 3' positions, anti-sense tags

peaks The 5′ to 3′ sequencing requirement and short read length produce stranded bias in tag distribution. The shaded blue oval represents the protein of interest bound to DNA (solid black lines). Wavy lines represent either sense (blue) or antisense (red) DNA fragments from ChIP enrichment. The thicker portion of the line indicates regions sequenced by short read sequencing technologies. Sequenced tags are aligned to a reference genome and projected onto a chromosomal coordinate (red and blue arrows). (A) Sequence-specific binding events (e.g. transcription factors) are characterized by “punctuate enrichment” [11] and defined strand-dependent bimodality, where the separation between peaks (d) corresponds to the average sequenced fragment length. Panel A was inspired by Jothi et al. [32]. (B) Distributed binding events (e.g. histones or RNA polymerase) produce a broader pattern of tag enrichment that results in a less defined bimodal pattern.