pinellolab / STREAM

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
http://stream.pinellolab.org
GNU Affero General Public License v3.0
173 stars 48 forks source link

identical pseudotime of cells #55

Closed ccshao closed 4 years ago

ccshao commented 4 years ago

Hello STREAM team,

In the generated pseutotime trajectory by stream(v 0.3.9), several cells have identical pseudotime. It happens more often in aligned data (in adata2 via st.map_new_data(adata1, adata2)). Would be better to generate unique pseudotime for each cell? and how to do it post hoc?

Thanks!

huidongchen commented 4 years ago

The identical pseudotime may happen in two cases:

ccshao commented 4 years ago

@huidongchen Thanks vary much for the quick reply! In indeed, for the original trajectory inference, the non-unique pseudotime don't enriched in the ends.

In our project we have to two datasets with around 400 cells in each of them. umap is used for the dimension reduction. Here are the codes for the trajectory infering.

st.normalize_per_cell(adata)
st.log_transform(adata)
st.select_variable_genes(adata, loess_frac=0.01, save_fig=True)
st.dimension_reduction(adata, n_components=2, method ='umap')
st.plot_visualization_2D(adata, save_fig=True)
st.seed_elastic_principal_graph(adata)
st.elastic_principal_graph(adata, epg_alpha=0.03)
st.extend_elastic_principal_graph(adata)

The plot_visualization_2D is Screenshot from 2020-01-09 16-56-59

and the subway plot. The linear trajectory is what we expected. Screenshot from 2020-01-09 16-57-54

huidongchen commented 4 years ago

Hi,

In which steps we should pay attention to the parameters, for the "umap" dimension reduction method?

I see that you are keeping two components for st.dimension_reduction(), so st.plot_visualization_2D() does not really matter here. It is only helpful when you keep >=3 components. Anyhow, the current structure looks reasonable to me.

There are more non-unique pseudotime points in the aligned data (adata2) by st.map_new_data(adata1, adata2). Which paramters in map_new_data might be useful?

During the mapping procedure, only the cells will be projected to the learned structure. So mapping procedure doen't really change the structure itself. That being said, I would not suggest adjusting map_new_data to distinguish cells with the same pseudotime. Instead, i would probably try to finetune the parameter nb_pct (percentage of neighbors) in st.dimension_reduction (maybe lower it) to see if you can get finer structure of cells.

Is that ok we add "jitter" to the whole pseudotime, or to the non-unique pseudotime, to make the value unique?

I'm not sure what that is supposed to mean.

We would like to have unique pseudotime, but is that a big issue to have non-unique pseudotime points?

I think it's fine for some cells to share the same pseudotime and it is actually common. For the cells around terminal nodes sharing the same pseudotime, you can try to adjust parameters in st.extend_elastic_principal_graph. Regarding the identical pseudotime for adata2, as I said, it makes more sense to adjust the parameters in this step 'st.dimension_reduction()'.

ccshao commented 4 years ago

@huidongchen very sorry for the late updates.

We tried the suggestions with 0.3.9. Both Increasin gand decreasing nb_pct (to 0.05 or 0.2 from 0.1) in st.dimension_reduction leads to less unique pseudotime points in adata1 and adata2. Increasing the value provides the fewest unique pseudotime points in our datasets.

On projection, mlle is not helpful as well, comparing to umap.

When talked about jitter, I means add small noise to the pseudotime of adata1 and adata2 after stream processing.