kharchenkolab / Baysor

Bayesian Segmentation of Spatial Transcriptomics Data
MIT License
142 stars 29 forks source link

Inconsistent number of cells in the output files #93

Closed zsfrbkv closed 10 months ago

zsfrbkv commented 10 months ago

Hi!

I am planning to use Baysor for segmenting Xenium data. I ran the following command to obtain the output:

baysor run -m 5 -x x_location -y y_location -z z_location -g feature_name -p --prior-segmentation-confidence 0.5 -o ROI_2/baysor_segmentation.csv ROI_2/filtered_transcripts.csv :cell_id (as is suggested here).

I was hoping to get cell coordinates from the results. However, when I read in the output files baysor_segmentation.csv and baysor_segmentation_cell_stats.csv they seem to contain different number of cells:

df1 = pd.read_csv("ROI_2/baysor_segmentation_cell_stats.csv")
df1.shape
> (311778, 13)

df2 = pd.read_csv("ROI_2/baysor_segmentation.csv") 
len(np.unique(df2.cell_id))
> 191770

Is there some additional filtering happening that is not mentioned on the Github page?

Thanks!