jbloomlab / seqneut-pipeline

Pipeline for analyzing sequencing-based neutralization assays
MIT License
0 stars 0 forks source link

Max fraction infectivity filter and barcode fraction consistency filter should be moved from plate QC to curve fitting QC #17

Closed anloes closed 9 months ago

anloes commented 9 months ago

One of the most common types of errors that is observed experimentally with this method is a single well in which a single barcode is detected at higher than expected levels. I suspect that this is a result of variation in vRNA production by infected cells. We are able to mitigate the impact of this issue by using multiple barcodes per strain and by making sure that we have adequate coverage of each barcoded strain so that we are averaging over a large number of infected cells - this is why there is an upper limit of barcoded strains that I suggest to be used in this method. With this study, we used 113 barcodes, and and infected 50,000 cells with a TCID50 of 25,000. This should be approximately 150-220 infectious virions for each barcode in each well. To maintain a minimum of 100 virions per barcode, our libraries should contain approximately 50-80 strains, if each is represented by 3 unique barcodes. It may be may possible to increase this number by increasing the cell number in each well and/or decreasing the number of barcodes associated with each variant to 2.

When high vRNA corresponding to a barcode is detected, this typically occurs within a single well and for that well, a small number of barcodes (typically 1, sometimes 1-3) are detected at a higher than expected fraction infectivity. When this issue occurs in a selection condition, it triggers a max fraction infectivity error, and when this occurs in a no-serum well, it triggers a barcode fraction consistency error. I suggest that this type of error is handled by removing specific barcodes from specific wells. However, currently as these flags occur in the initial plate processing QC, the only mechanisms by which these errors can be removed is either by removing the entire well from analysis or the barcode from analysis for that entire plate.

In terms of barcode_frac_consistency, as we use a large number of no-serum controls and calculate a ratio of barcode to spike-in for each of these wells, then use the median value for that as our 100% fraction infectivity, I do not think we need to worry about removal of barcodes that trigger this error, as our calculations should be robust to this error so long as it is not occurring across many wells for the same barcode. I propose that this error should only be triggered in the event that the same barcode fails in multiple wells.

For max_fraction_infectivity, again as this is typically occurring in a single well for each barcode where this is observed, I think prior to fitting curves with the fraction infectivity data, we can remove fraction infectivity measurements that are higher than expected (i.e. >5 or >6).

jbloom commented 9 months ago

In the new version 2.0.0, there is per-barcode per-well filtering that does this.