jbloomlab / SARS-CoV-2-RBD_DMS

Deep mutational scanning of the receptor-binding domain of SARS-CoV-2 Spike
BSD 3-Clause "New" or "Revised" License
43 stars 17 forks source link

Update single mut effects homologs #35

Closed tylernstarr closed 4 years ago

tylernstarr commented 4 years ago

As can be seen in the violin plots at the end of compute_expression_meanF, like SARS-CoV-2 WT barcodes, some of the homologs have barcodes that trail down to low expression. This tail of outliers drags down the computation of mean expression across barcodes, e.g. making Rf1 have a seemingly decreased expression relative to SARS-CoV-2. For WT SARS-CoV-2, I manually filtered out these outlier barcodes (by identifying a cutoff such that median and mean expression across remaining barcodes were closer to converging) -- I did this to prevent the SARS-CoV-2 mean expression from being dragged down by these outliers during the global epistasis fit, which could cause perfectly neutral single mutations to appear to have beneficial effects on expression.

However, for computing mean expression scores for each of the homologs, rather than exclude the barcodes in this tail, here I change to just computing median instead of mean expression across barcodes encoding a target, which is more robust to outliers.