jbloomlab / SARS-CoV-2-RBD_DMS

Deep mutational scanning of the receptor-binding domain of SARS-CoV-2 Spike
BSD 3-Clause "New" or "Revised" License
43 stars 17 forks source link

build the variants #11

Closed jbloom closed 4 years ago

jbloom commented 4 years ago

@tylernstarr, the code now builds the variants. This means the build_variants notebook now runs most of the way through (see here). It doesn't quite finish as I need to refactor dms_variants to hold the different targets, which I'll work on tomorrow.

But it goes far enough you can see things seemed to work well: there are about 100 barcodes for each non-SARS-CoV-2 variant in each library, plus about 100,000 barcodes for each SARS-CoV-2 variant.

There are a few variants with lots of barcode reads that are filtered because of too many mutational differences (perhaps what you were referring to on Slack), but for each library this is only about 1,000 out of 100,000. It is probably possible to re-jigger than parameters to simple_mutconsensus to recover some of these, but I'm not sure it's worth it? Right now the settings make the method very robust to any DNA heteroduplexes or barcode collisions (same barcode on two variants) by throwing out if there are large or repeated differences within a barcode. Even if we recovered all of the variants that are currently lost, we'd only gain 1% more, and it would potentially come at a cost of decreased accuracy if either of above factors are happening. So at least for now, doesn't seem worth messing with.

Do you want review and merge? I'll keep working on it more tomorrow.