Open giacuong171 opened 1 week ago
For example, this is a part of the segmentation.csv from the first run.
x,y,gene,empty,cell,molecule_id,prior_segmentation,confidence,compartment,nuclei_probs,assignment_confidence,is_noise 9717.0,11350.0,Snca,,CR67317a852-11920,2951698,0,0.99832,Cyto,0.13242840648552556,0.42,false 9726.0,11299.0,Snca,,CR67317a852-9821,2951699,0,0.99462,Nuclei,0.5933353653354774,0.78,false 9739.0,11301.0,Snca,,CR67317a852-9821,2951700,0,0.99835,Cyto,0.03185690107778871,0.8,false 10438.0,12060.0,Snca,,CR67317a852-11851,2951701,7513,0.99971,Cyto,0.045458917465199034,0.74,false
and this is from the second run
x,y,gene,empty,cell,molecule_id,prior_segmentation,confidence,compartment,nuclei_probs,assignment_confidence,is_noise 9717.0,11350.0,Snca,,,2951698,0,0.99832,Cyto,0.13257818690254874,0.12,true 9726.0,11299.0,Snca,,,2951699,0,0.99462,Cyto,0.5026373748293727,0.54,true 9739.0,11301.0,Snca,,CR9da53fb2f-9844,2951700,0,0.99835,Cyto,0.031866318479271016,0.46,false 10438.0,12060.0,Snca,,CR9da53fb2f-10751,2951701,7513,0.99971,Cyto,0.04547240272891517,0.74,false
I have tried the new version of baysor, but the issue still occurs. The following images show the difference between the first run and the second run.
Hi @giacuong171 , could you please provide more statistics on the output? For example, rand index between cell assignment?
Baysor output is stochastic, so some differences are expected. Usually, it's differences in small cells, which should be filtered (i.e., forcefully assigned to background) anyway. Larger cells should have stable transcript assignment.
Also, looking at assignment_confidence
helps. In cases where assignment changed, the confidence was rather low.
Hi @VPetukhov, thanks for your response. The differences between the two runs are quite minimal, around 0.007% of the total molecules. Could you explain how to calculate the Rand index?
Additionally, the two images above show that the assignment_confidence values make it difficult to determine which data points should be filtered. For example, in lines 2917842, 2924302, and 2924429, the first run classifies them as non-noise with high assignment_confidence, while in the second run, they are classified as noise, despite also having high assignment_confidence.
If you're using Python, then here is the sklearn function, and here is their description. And, to run it you'd need to transform all labels to integers, replacing NaN
s with 0
.
As for the assignment_confidence
, it's not perfect indeed. But thresholding it by something like 0.8
gives more stable results and reduces contamination in cells.
I obtained a Rand index of 0.894. Additionally, I have a question regarding the cell ID names. After each run, I notice a different prefix code in the cell ID names. I wonder if the numbers in the suffix remain the same and are in the same order across different runs.
0.89 is pretty good. The prefix is the run id, and the suffixes do not match between runs.
Hi Baysor team,
I'm currently experiencing an issue where Baysor generates different
baysor_count
tables even though I'm using the same parameters. Has anyone else encountered this problem? I'm using Baysor v0.6.2.