welch-lab / MultiVelo

Multi-omic velocity inference
BSD 3-Clause "New" or "Revised" License
105 stars 12 forks source link

Subsetting before Multivelo analysis #45

Open Dalhte opened 3 weeks ago

Dalhte commented 3 weeks ago

Hello ! I'm performing a multivelo analysis on a complex tissue with more than 10 well-represented cell types. When I perform the analysis on the whole sample, it work just fine. However, I'm mainly interested in a peculiar cluster of cell. Subsetting the adata_result from the "global analysis" for my cluster of interest work but 1) I loose some important genes 2) and I think analyzing every clusters together biases somewhat the results (because of genes expressional leaking and random chromatin accessibility in big cell clusters).

So I tried to run multivelo only on this cluster by subsetting my object before the analysis

cur_celltypes = ['Cluster'] adatam_object_cluster = adatam_object[adatam_object.obs['cell_Type'].isin(cur_celltypes)]

Everything goes smoothly up to the mv.recover_dynamics_chrom step :

adata_result = mv.recover_dynamics_chrom(adatam_object_cluster, adata_atac_cluster,max_iter=5, init_mode="invert",verbose=False,parallel=True,save_plot=False,rna_only=False,fit=True,n_anchors=500,extra_color_key='orig.ident')

and I get, at around 18% of the job :

.... File "/home/.local/lib/python3.8/site-packages/multivelo/dynamical_chrom_func.py", line 1386, in initialize_steady_state_params for t_sw_1 in np.arange(1, rna_interval-1, 2, ValueError: arange: cannot compute length """ .... ... The above exception was the direct cause of the following exception:

... File "/home/.local/lib/python3.8/site-packages/joblib/parallel.py", line 763, in _return_or_raise raise self._result ValueError: arange: cannot compute length

I'm wondering why. Do you have any idea? The only evident difference I see is the number of cells that would be insufficient. Could that be possible ?

Best David

danielee0707 commented 3 weeks ago

It could be related to low-quality genes after subsetting, see #5 and #28. The highly-variable gene selection step during preprocessing is recommended to run directly on the object that is given to MultiVelo. Without a good phase portrait, the trajectory inference is likely to fail. You can also try removing the single gene that's causing this error and rerunning again.