Open PeteHaitch opened 3 years ago
Hi Pete,
Thanks for sharing you experience. the results indeed look not convincing. I noticed that the cell numbers are quite high, this might make the model easier to stuck at local optima. One quick thing you can try is increase the number of initializations by --nInit
, e.g., to 200. You can use -p
to increase the number of processes for parallel computing, while it will use more memory.
Let me know how it goes. Yuanhua
Thanks, Yuanhua.
I'm re-running with --nInit 200
and I'll report back the results.
Increasing to --nInit 200
gave me:
[vireo] Loading cell folder ...
[vireo] Demultiplex 62746 cells to 8 donors with 182385 variants.
[vireo] lower bound ranges [-13267522.5, -11771899.8, -11602801.9]
[vireo] allelic rate mean and concentrations:
[[0.031 0.452 0.934]]
[[14038312.6 12274692.6 5596478.8]]
[vireo] donor size before removing doublets:
donor0 donor1 donor2 donor3 donor4 donor5 donor6 donor7
2777 3722 11969 1737 11350 1861 16069 13261
[vireo] final donor size:
donor0 donor1 donor2 donor3 donor4 donor5 donor6 donor7 doublet unassigned
2284 3205 10634 1478 9410 36 14445 11851 8455 948
[vireo] All done: 498 min 4.3 sec
Increasing to --nInit 1000
gave me:
[vireo] Loading cell folder ...
[vireo] Demultiplex 62746 cells to 8 donors with 182385 variants.
[vireo] lower bound ranges [-13272912.3, -11791179.9, -11605879.4]
[vireo] allelic rate mean and concentrations:
[[0.027 0.45 0.937]]
[[13773588.1 12622406.3 5513489.7]]
[vireo] donor size before removing doublets:
donor0 donor1 donor2 donor3 donor4 donor5 donor6 donor7
15976 2273 4152 1516 10538 11619 13133 3538
[vireo] final donor size:
donor0 donor1 donor2 donor3 donor4 donor5 donor6 donor7 doublet unassigned
14430 25 3352 126 9379 10638 11839 3185 8578 1194
I'm a little concerned that varying --nInit
changes results for 'rare' donors so much, e.g., although I might like to believe the --nInit 200
results because only 1 donor has < 1000 cells, I would naively expect that the --nInit 1000
results should be more accurate/precise/reliable.
Aside: the version of vireo
I have (v0.5.5
installed from https://pypi.org/project/vireoSNP/) does not have the -p
/--nproc
option (it's not documented when vireo --help
is run and setting it leads to an error vireo: error: no such option: --nproc
)
This discordance between --nInit 200
and --nInit 1000
is indeed concerned. You meant these 62746 are from 3 captures, so each has ~21K cells? Eventhough the number of doublets could still be high and some donors may become minor, both of which can increase the challenge in donor identification.
One possibly way is to perform the demultiplexing on each of the three sets separately, and link them. So in each set, the doublets may not be so high.
For the version of vireo
, I guess you are still using an older version (you can type vireo
in terminal). The PyPI sometimes fails to replace the older version, especially installed from github initially. In the v0.5.5, it should support --nproc
.
Yuanhua
You meant these 62746 are from 3 captures, so each has ~21K cells?
Yes, that's correct.
One possibly way is to perform the demultiplexing on each of the three sets separately, and link them. So in each set, the doublets may not be so high.
This is how I initially performed the demultiplexing. However, I was unsure how to best link the donors between each capture. Could you please suggest how to do this using the vireo output (e.g., which files, fields,etc. to parse and link).
For the version of vireo, I guess you are still using an older version (you can type vireo in terminal). The PyPI sometimes fails to replace the older version, especially installed from github initially. In the v0.5.5, it should support --nproc.
Oh, that's surprising to me but you're correct, I am running vireo 0.5.0
(base) vc7-shared 501 % vireo
Welcome to vireoSNP v0.5.0!
Hi,
I'm running
cellsnp-lite
(v1.2.0
) andvireo
(~v0.5.5
~v0.5.0
from pip) to demultiplex a 3' 10X scRNA-seq dataset with cells from 8 human donors. These cells were pooled and run over 3 separate 10x captures, so I have appended a unique capture-specific suffix to theCB
cell barcode in the BAM file usingappendCB
(https://github.com/ruqianl/appendCB) to distinguish potential duplicate cell barcodes across captures. The donors are not evenly pooled; I expect 4 donors to have a similar, larger number of cells and the other 4 donors to have a similar, smaller number of cells.I run
cellsnp-lite
+vireo
roughly as follows:You can see in the final few lines that lots of cells are 'lost' following the removal of doubblets, which particularly noticeable for
donor1
donor2
anddonor5
as a percentage of the initial assignments.I'm looking for some suggestions on how I can debug this and for possible causes of this behaviour? This is not the only dataset for which I've seen this issue and I can share some other examples if that is helpful.
For the most part I've had a lot of success with
vireo
so thank you for developing it!Cheers, Pete