identical pseudotime of cells

ccshao commented 4 years ago

Hello STREAM team,

In the generated pseutotime trajectory by stream(v 0.3.9), several cells have identical pseudotime. It happens more often in aligned data (in adata2 via st.map_new_data(adata1, adata2)). Would be better to generate unique pseudotime for each cell? and how to do it post hoc?

Thanks!

huidongchen commented 4 years ago

The identical pseudotime may happen in two cases:

cells around the terminal nodes. For the sake of robustness, the curves don't always reach the faraway points. So cells around the same terminal node might be all mapped to the same end node and therefore they will share the same pseudotime. (This can be solved by tuning parameters in st.extend_elastic_principal_graph to reach further points)
no clear compact trajectory pattern exists in your data or the learned structure is not appropriate (This sounds more like your case). You can first check st.plot_visualization_2D output to see if there is a clear trajectory pattern. If yes, then you can check subway map plot to see if cells are assigned close to the branch. If cells are kind of distant from the branches, that means the assignment of these cells is not confident. You might need to adjust the parameters and re-run the structure learning part.

ccshao commented 4 years ago

@huidongchen Thanks vary much for the quick reply! In indeed, for the original trajectory inference, the non-unique pseudotime don't enriched in the ends.

In our project we have to two datasets with around 400 cells in each of them. umap is used for the dimension reduction. Here are the codes for the trajectory infering.

st.normalize_per_cell(adata)
st.log_transform(adata)
st.select_variable_genes(adata, loess_frac=0.01, save_fig=True)
st.dimension_reduction(adata, n_components=2, method ='umap')
st.plot_visualization_2D(adata, save_fig=True)
st.seed_elastic_principal_graph(adata)
st.elastic_principal_graph(adata, epg_alpha=0.03)
st.extend_elastic_principal_graph(adata)

The plot_visualization_2D is Screenshot from 2020-01-09 16-56-59

and the subway plot. The linear trajectory is what we expected. Screenshot from 2020-01-09 16-57-54

In which steps we should pay attention to the parameters, for the "umap" dimension reduction method?
There are more non-unique pseudotime points in the aligned data (adata2) by st.map_new_data(adata1, adata2). Which paramters in map_new_data might be useful?
Is that ok we add "jitter" to the whole pseudotime, or to the non-unique pseudotime, to make the value unique?
We would like to have unique pseudotime, but is that a big issue to have non-unique pseudotime points?

huidongchen commented 4 years ago

Hi,

In which steps we should pay attention to the parameters, for the "umap" dimension reduction method?

I see that you are keeping two components for st.dimension_reduction(), so st.plot_visualization_2D() does not really matter here. It is only helpful when you keep >=3 components. Anyhow, the current structure looks reasonable to me.

There are more non-unique pseudotime points in the aligned data (adata2) by st.map_new_data(adata1, adata2). Which paramters in map_new_data might be useful?

During the mapping procedure, only the cells will be projected to the learned structure. So mapping procedure doen't really change the structure itself. That being said, I would not suggest adjusting map_new_data to distinguish cells with the same pseudotime. Instead, i would probably try to finetune the parameter nb_pct (percentage of neighbors) in st.dimension_reduction (maybe lower it) to see if you can get finer structure of cells.

Is that ok we add "jitter" to the whole pseudotime, or to the non-unique pseudotime, to make the value unique?

I'm not sure what that is supposed to mean.

We would like to have unique pseudotime, but is that a big issue to have non-unique pseudotime points?

I think it's fine for some cells to share the same pseudotime and it is actually common. For the cells around terminal nodes sharing the same pseudotime, you can try to adjust parameters in st.extend_elastic_principal_graph. Regarding the identical pseudotime for adata2, as I said, it makes more sense to adjust the parameters in this step 'st.dimension_reduction()'.

ccshao commented 4 years ago

@huidongchen very sorry for the late updates.

We tried the suggestions with 0.3.9. Both Increasin gand decreasing nb_pct (to 0.05 or 0.2 from 0.1) in st.dimension_reduction leads to less unique pseudotime points in adata1 and adata2. Increasing the value provides the fewest unique pseudotime points in our datasets.

On projection, mlle is not helpful as well, comparing to umap.

When talked about jitter, I means add small noise to the pseudotime of adata1 and adata2 after stream processing.

pinellolab / STREAM

identical pseudotime of cells #55