pachterlab / scRNA-Seq-TCC-prep

Preprocessing of single-cell RNA-Seq (deprecated)
http://kallistobus.tools
GNU General Public License v3.0
62 stars 26 forks source link

Very low number of cells detected #2

Open maximilianh opened 7 years ago

maximilianh commented 7 years ago

Hi, I'm confused by this output of the first script:

get_cell_barcodes.py

...
NUMBER_OF_SEQUENCED_BARCODES = 35910128
Detecting Cells...
NUM_OF_DISTINCT_BARCODES = 5309260
CELL_WINDOW: [100, 5000]
Cell_barcodes_detected: 100
NUM_OF_READS_in_CELL_BARCODES = 3200555
Calculating d_min...
number of cell barcodes to error-correct: 86 ( dmin >= 5 )
Writing output...
....

It sounds fishy to me that it always detects whatever I specify as the lowest number of cells. If I set the window [500,5000] cells, it will find 500 cells. If I set it to 100,5000, it will find 100 cells. Is this expected? If not, any ideas what I'm doing wrong?

thanks! Max

alex-botzki commented 7 years ago

This effect is due to the way the threshold is determined using the Savitzky-Golay filter followed by looking for the neg. maximum diff umi counts. If the curve does not have a maximum value in the window compared to the value at the first value of the window, it will find 500 cells as threshold if a window of 500 to 5000 has been specified. We also found this behavior for several of our 10x Genomics runs. As far as we see, it could be related to the species your samples are from. For human and mice sample, the described strategy works usually (unless you have a bad sample). The only way we see is to change the script and use another thresholding strategy.

vibbits commented 7 years ago

We have written a perl wrapper and adapted the python scripts from pachterlab to work with Chromium 10X chemistry v1 and v2. All is available on our github https://github.com/vibbits/sc_read_kallisto_wrapper. The thresholding strategy is finally taken over from CellRanger's thresholding strategy. Future work: we could look into the potentially faster code from the other fork and adapt it to work with chem v1 and chem v2, too