velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
160 stars 83 forks source link

#Weird results: Few arrows or chaotic arrows. What are the key parameters for deep sequencing RNA-seq data? #125

Open Suger0917 opened 6 years ago

Suger0917 commented 6 years ago

Dear velocyto team, The spliced and unspliced model is really excellent! Our project is about the development of an organ. The data is consist of 2000+ cells with 5000 gene number and 100,000 UMI in average. The 'ClusterName ' was set as in the Seurat dataset. The ratio of spliced and unspliced molecules is 80% and 17%. I followed the analysis pipeline but only got few arrows which can not show the flow. If I set the 'quiver_scale ' and 'min_pass' as a really small number, I got chaotic arrows. vlm.plot_grid_arrows(scatter_kwargs_dict={"alpha":0.5, "lw":1, "edgecolor":"0.4", "s":80, "rasterized":True}, min_mass=0.1, angles='xy', scale_units='xy', headaxislength=2.75, headlength=5, headwidth=4.8, quiver_scale=0.005, scale_type="absolute") image If I used unspliced prediction, I got no arrows.Could you please give me some advice?

Does the parameter 'k' or 'n_neighbors' influence the prediction? vlm.knn_imputation(n_pca_dims=19, k=70, balanced=True, b_sight=400, b_maxl=200, n_jobs=6) vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt", psc=1, n_neighbors=30, knn_random=True, sampled_fraction=0.95, calculate_randomized=True)

When I use Monocle2, I have to set the start state. But I am not sure about which state is the start state of my project, I think RNA velocity is the right method for this project. I really need your professional guidance!!! Can anyone help me ? Thank you!

gioelelm commented 6 years ago

I cannot be sure because this information is only partial, but I think the problem is in the preliminary gene filtering. It might be that you are skipping the filtering altogether (note that this is a standard step in velocyto)... I am deducing it from the fact that you get a division by zero, gamma should never be zero and it hints that you might be including some genes that have 0 unspliced UMIs. Try to check wich genes you are including in the model at that step

Please feel free to send me the jupyter notebook by email if the problem persists after you perfromed this troubleshooting.

Suger0917 commented 6 years ago

Thanks for your kind suggestion! I have emailed the Jupyter notebook to you yesterday. I noticed that there are several functions mentioned knn_smoothing. I am not familiar with this algorithm, and don't know how it influenced the results. Could you please give me some websites or papers that explain the usage of knn_smoothing.