jbloomlab / SARS-CoV-2-RBD_DMS

Deep mutational scanning of the receptor-binding domain of SARS-CoV-2 Spike
BSD 3-Clause "New" or "Revised" License
43 stars 17 forks source link

confused about a particular plot in build_variants.ipynb #100

Open adalisan opened 2 years ago

adalisan commented 2 years ago

regarding mutations_per_variant computation in build_variants.ipynb. There is a points I can't understand. How can you have more than one variant that has 0 codon substitituions? If a sequence has 0 codon substitutions, isn't it exactly wildtype?

tylernstarr commented 2 years ago

A "variant" in this context is a barcode-associated gene in the library. So the multiplicity or 0 mutant indicates the number of replicate barcode sequences that are linked to wild type sequences. Similarly, 1-mutant doesn't mean the number of unique single mutants but rather the number of barcodes for which the associated gene has one mutation in it

adalisan commented 2 years ago

But I thought there is a 1-1 relationship between variants and barcodes? In the paper https://www.sciencedirect.com/science/article/pii/S0092867420310035#mmc3, it is mentioned that "...linked each RBD variant to its barcode via long-read PacBio SMRT sequencing". If you see a wildtype sequence read that also contain a barcode, would it not have to contain a particular barcode that is linked to wildtype "variant"? Or is it that , in the sequencing data, if you see a particular sequence that is nearly identical to the original barcode associated with a variant, it is also considered a barcode?

jbloom commented 2 years ago

There can be multiple barcodes associated with the same RBD coding sequence. The variants do not necessarily have to be unique in their coding sequences. In other words, the same barcode will never be associated with multiple different coding sequences. But the same coding sequence may be associated with multiple barcodes.