greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs #1023

Open gwaybio opened 4 years ago

gwaybio commented 4 years ago

A lack of tools to precisely control gene expression has limited our ability to evaluate relationships between expression levels and phenotypes. Here, we describe an approach to titrate expression of human genes using CRISPR interference and series of single-guide RNAs (sgRNAs) with systematically modulated activities. We used large-scale measurements across multiple cell models to characterize activities of sgRNAs containing mismatches to their target sites and derived rules governing mismatched sgRNA activity using deep learning. These rules enabled us to synthesize a compact sgRNA library to titrate expression of ~2,400 genes essential for robust cell growth and to construct an in silico sgRNA library spanning the human genome. Staging cells along a continuum of gene expression levels combined with single-cell RNA-seq readout revealed sharp transitions in cellular behaviors at gene-specific expression thresholds. Our work provides a general tool to control gene expression, with applications ranging from tuning biochemical pathways to identifying suppressors for diseases of dysregulated gene expression.

https://doi.org/10.1038/s41587-019-0387-5

gwaybio commented 4 years ago

One section of a larger paper involves training a CNN on an "allelic series" of CRISPRi expression "titrations". That sentence is painful to read... in other words, in the assay, the authors systematically tinker with sgRNA sequences to toggle the impact of CRISPR knockdown on gene expression. This enables the authors to directly readout ground truth impact of modulating gene expression levels in a continuum between basal and knockout.

The input to the CNN are sgRNA sequences and their corresponding "relative activity". The relative activity is a single number representing a growth phenotype (essentially cell count). The authors train an ensemble of CNNs and evaluate their model on a heldout test set. They also validate their model by showing that it can also predict GFP expression in a CRISPRi allelic series targeting GFP as the "relative activity".

Model Details

Two convolutional layers, followed by a max pooling layer, then a fully connected layer to predict activity. The authors train 20 different models and inference on new data happens by taking the mean prediction of the 20 models.

Performance

The CNN ensemble outperforms a logistic regression model (r^2 = 0.65 vs. r^2 = 0.52)

Interpretation

The authors show that mismatch position (along the sgRNA construct) and mismatch type (e.g. A -> T) were the most informative features. GC content also important, and intermediate location between end and PAM seemed to be also informative.

Interesting Highlight

The authors also used their trained model to impute the sgRNA constructs that would most likely result in activity between a certain level. This helped with designing a more compact sgRNA library 🤯

gwaybio commented 4 years ago

This paper is a good example of a trend where Deep Learning is becoming more integrated into primarily assay development/molecular biology efforts

gwaybio commented 4 years ago

the model is here: https://static-content.springer.com/esm/art%3A10.1038%2Fs41587-019-0387-5/MediaObjects/41587_2019_387_MOESM4_ESM.html