Open smorabit opened 2 years ago
Hi CellTrek team,
I'd like to piggyback on smorabit's question since I have some similar observation and really would like to understand this for downstream analysis I'm developing.
To be more specific, I actually found that
and end up with more cell count than the original scRNA object.
To give a little more details, Here's a quick summary of cell count I have after the run
For the last part E, I investigated more, I realized that
.1, .2, .3, .4
to the idsFor instance this ID got duplicated 4 times (5 including original ID)
.1, .2, .3, .4
. Lastly, I did a quick count on how many spots were duplication and how often,
AAACCGAAGCTATGAC.1
cell from the table above, falls into the last 5 times category in this plotFrom what I can read from the paper and the Github, I cannot find a good explanation for this effect, I was wondering if
Thank you and would really appreciate the comment from the CellTrek team on this matter.
Thank you!! Simon
Hi Simon and Sam. Thank you for sharing your questions. Also thanks for such a detailed investigation. We have provided some parameters in the CellTrek function: top_spot=5 means 1 cell could be mapped to (at most) 5 spots and spot_n=5 means 1 spot could contain (at most) 5 cells, dist_thresh=0.55 meaning we set up a distance threshold for mapping. The reason we do so is that 1) in many cases, we observed biological and molecular-similar regions that existed across different areas of the tissue (less global spatial structure, more local structure, for example, two spatially distinct ductal structures in the human breast tissue which shows almost the same expression and histological patterns). We thus allowed some redundancies in the cell mapping. 2) we make the default parameters more strict (which yield fewer cells to be mapped), to avoid over-estimate of cells spatially. This led to only part of the cells could be mapped. But one can decrease the top_spot to avoid the cell redundancy, increase the intp_pnt to have more augmented spots, and increase the dist_thresh to allow more loose mapping. Do mind that with more cell mapping, the false positive could also increase. Hope this helps. We are working on making these parameters more intuitive in our next tutorial.
Hi WandeRum,
Thank you for the prompt reply and explanation! Yeah, that makes sense to me, and it sounds like it will take some effort to find the correct parameter to use for a given sample.
I also wonder how multiple mapping of scRNA cells could affect downstream analysis. My first thought is, that one might need to be cautious when interpreting downstream analysis results, especially those sensitive to the number of cells.
I also wonder how much one can read into the potential interaction between cells mapped near to each other?
Also, is there a metric to tell how confident is the mapping of each cell in the CellTrek result?
This could be helpful for identifying optimal parameter settings for the downstream analysis.
Thanks!
Simon
Hi @WandeRum
To follow up on this question, is there a way to force assign all single cells to a spatial location? I tried to change dist_thresh=0.55 to dist_thresh=2, but still didn't get all my single cells with at least one position assigned. What would be a setting so that all cells could be assigned? dist_thresh=10? dist_thresh=20? dist_thresh=100?
I am trying to use CellTrek to map my single-cell RNA-seq data onto Visium ST coordinates. I have a dataset of 500k+ cells for my single-cell RNA-seq.
I ran the
traint
function with default arguments, and the co-embedding looks pretty reasonable. However, when I tried to runcelltrek
with the default arguments, I noticed that only a fraction of the cells in my single-cell dataset were mapped. I also tried downsampling my scRNA data to only map a portion with celltrek, but still there are a lot of cells missing from the output ofcelltrek
. I am not sure why this is happening, but I am wondering what settings I should use in thecelltrek
function if I want to return coordinates for every input cell? Alternatively, if there are cells that have a low mapping to the ST dataset, and that's why they aren't showing up in the output, are there any metrics that celltrek returns to check the prediction condidence? Let me know if you need any more info to answer my question.Thanks, Sam