jbloomlab / barcoded_flu_pdmH1N1

Barcoded pdmH1N1 virus hashing experiment
5 stars 1 forks source link

define "true" viral barcodes #42

Closed jbloom closed 3 years ago

jbloom commented 3 years ago

@dbacsik says:

The main question right now is how we define and filter "true" viral barcodes from the transcriptome data and from the supernatant/second infection sequencing data.

The data show a lot of viral barcodes that are unique to each sequencing sample. The number of unique viral barcodes in the sequencing far exceeds the plausible number of virions infected into the cells (~1.25e4 visions). To distinguish real barcodes from errors, I have used the following filtering criteria:

  • The barcode must be present in the “cell” (first infection) sample. It stands to reason that any real barcode present in the supernatant or second infection should be derived from the first infection.
    • The barcode must be present above some threshold frequency.

This works OK, and there are reasonable correlations between the barcodes in the first infection, supernatant, and second infection. However, there is still substantial noise, and the raw data still shows barcodes that are present at high frequency in the supernatant or second infection but not found in the first infection.

jbloom commented 3 years ago

I think UMI_tools has methods for figuring out "true" barcodes from mismatches that could be applied to the viral barcodes within single cells.

Within the whole mix, there is some complicated set of viral barcodes. That is true both if we look at all barcodes over transcriptomics (across cells) or if we look at all barcodes in the supernatant.

We really only care about the viral barcodes that are in the transcriptomics.

Furthermore, within each cell, we expect there to be only at most a few distinct viral barcodes.

My suggestion is that first we identify truly infected cells in the transcriptomics.

Then we use the transcriptomic viral barcode calls to use something like UMI_tools to figure out what are the true viral barcode in that cell. And then we have our set of true viral barcodes in the transcriptomic data.

jbloom commented 3 years ago

If we take this approach, we need to #58

dbacsik commented 3 years ago

A few ideas came out of our discussion about this issue today:

dbacsik commented 3 years ago

This nebulous issue has been superseded by more specific tasks. I am closing it.