lzamparo / crisprML

Library design for CRISPr screen
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Assemble library candidates from results #6

Open lzamparo opened 6 years ago

lzamparo commented 6 years ago

We have 324561 exons split across 94838 transcripts. We have 242027 unique exons to cover. We have 1534276 guides that are specific enough for our purposes that cover those exons.

Now we have to choose a guide per exon to make up a feasibly sized library. There are far more guides than are required, and even then too many exons to cover even if we choose one guide / exon.

Will discuss with Turgut tomorrow, but for now I'll see how many exons we need to cover if we consider just the first two exons of each Tx.

lzamparo commented 6 years ago

We have decided to not cover the whole transcriptome, since it does not seem feasible to do so with one person (too many exons to cover). Instead, we are going to cover the Brie exome from the Doench et al library, but with Guide-Scan chosen gRNAs.

This means from 68318 exons, we can target 54641 using guides with no perfect alignments in either hg19 or mm10 allowing up to two mismatches. When allowing up to three mismatches, there do not exist enough guides to hit each exon, but we can choose the guides with the minimal sum of hits in mm10 and hg19 allowing up to three mismatches. Below is the distribution of the sum of the number of genomic hits when allowing up to three mismatches: guide_specificity

So we need to decide whether to use up our budget for # guides, or to choose these, along with the number of controls.

lzamparo commented 6 years ago

We would like to have every gene covered by four guides, by convention. To arrive at this total, we agreed to allow the following relaxations:

  1. Allow multiple high-specificity guides per exon
  2. Find guides for all exons (not just first four) in genes not yet targeted by any guides
  3. Relax the constraints on specificity in human