Open pakiessling opened 1 month ago
Hi @pakiessling,
Have you tried this _map_transcript_to_cell function? It should do exactly what you want.
If that works and if it's useful, then I could make it more visible in the docs!
Hi, yes I ended up using this very fast & smooth.
It is very useful for things like subcellular analysis.
Would be cool to have this more visible!
Nice! Okay, I'll make it more visible!
@quentinblampey I actually have a question about _map_transcript_to_cell I used it lie this:
_map_transcript_to_cell(
sdata,
"cellpose_boundaries_row_nr",
sdata.points["my_points],
"cellpose_boundaries",
)
I assumed that the values it added were the row numbers of the geopandas dataframe. But:
sdata.points["my_points].compute().cellpose_boundaries_row_nr.value_counts()
cellpose_boundaries_row_nr
0 1331235
70150 3643
87365 3521
47877 3453
71058 3285
...
92583 10
29033 10
65535 10
57416 10
44450 10
This doesnt match the values in the table:
sc.pp.calculate_qc_metrics(sdata.tables["table"],inplace=True)
sdata.tables["table"].obs.iloc[0]
total_counts 141
sdata.tables["table"].obs.iloc[70150]
total_counts 1048
Do you know what could be happening there? This is fresh after Sopa Snakemake pipeline has been run. So everything should be correctly aligned?
The IDs are shifted by 1 because the ID 0 is already used for transcripts that are not assigned to any cell.
Can you look at the counts of the cell at iloc 70151
instead?
I would prefer to use NA for the transcripts that are not assigned, this way we would not need to shift the indices. But this is the expected Baysor input. I can add an argument to choose how we handle the "non-assigned transcripts", what do you think?
Ok I understand now.
In my data I actually have to substract one from the added number.
70149
corresponds to the correct row in the table and shapes.
I think NA would be less confusing. My favorite would actually be NA + the index (cell_id) from shapes and tables if that would be possible
70149 corresponds to the correct row in the table and shapes.
Yes, sorry, it's 70149 indeed.
Okay, I'll add the possibility to have NA + cell index!
Thanks a lot!
Hi Quentin,
I saw in #87 that you were making the choice to not update the points dataframe with the newly assigned cell id. I can see the value in not modifying the original data when it comes to something like the image, however I do not really see what it adds in the case of the cell assignment. So far I never wanted to go back to the original assignment (or If I would like to I still have the sopa input) but I really need the point-to-polygon assignment for analysis I want to do. Maybe just adding an additional column with the new cell_id would be a compromise?