pinellolab / dictys

Context specific and dynamic gene regulatory network reconstruction and analysis
GNU Affero General Public License v3.0
110 stars 14 forks source link

Dynamic inference with existing pseudotime values #54

Closed nikithkurella closed 7 months ago

nikithkurella commented 7 months ago

Hello. I was wondering how I might approach running dynamic inference if I already have the pseudotime values for my dataset. I have only two cell types and one branch.

lingfeiwang commented 7 months ago

Hi nikithkurella,

That is a great question. You have a custom step for trajectory inference, whose output needs to be reformatted for Dictys. I suggest to first start from the dynamic GRN inference tutorial at https://github.com/pinellolab/dictys/blob/master/doc/tutorials/full-skin/notebooks/main1.ipynb. Upon finishing this, you can check the trajectory data files (tmp/branch.tsv.gz, tmp/edge.tsv.gz, tmp/dist.tsv.gz) for their formats.

For your own data you can replace this notebook with the following. First you can identify two cells, namely the initial and final cells, with the smallest and largest pseudotime. You can place one trajectory node each at their respective locations. There's one edge between the two nodes, and all cells are on this edge. Finally, the distance from each cell to all nodes can be computed easily as their pseudotime difference with the initial or final cells.

Following this process allows you to generate the three trajectory data files. Then you can run https://github.com/pinellolab/dictys/blob/master/doc/tutorials/full-skin/notebooks/main2.ipynb to infer dynamic GRNs as in the tutorial.

Please follow up know if anything is unclear.

Lingfei

nikithkurella commented 7 months ago

Hi @lingfeiwang thank you so much for your response. I followed your directions to prepare the prerequisite files for the main2 notebook, and I've run into the following error:

ValueError                                Traceback (most recent call last)
Cell In[14], line 19
     17 traj=dictys.traj.trajectory.fromdist(edge.values,dist.values)
     18 traj.to_file('[/scratch1/kurella/Dictys/yAL/data/traj_node.h5](https://ondemand.carc.usc.edu/node/a02-09.hpc.usc.edu/56123/lab/tree/Dictys/yAL/Dictys/yAL/data/traj_node.h5)')
---> 19 point=dictys.traj.point.fromdist(traj,branch.values,dist.values)
     20 point.to_file('/scratch1/kurella/Dictys/yAL/data/traj_cell_rna.h5',traj=False)

File [~/.conda/envs/dictys/lib/python3.9/site-packages/dictys/traj.py:525](https://ondemand.carc.usc.edu/node/a02-09.hpc.usc.edu/56123/lab/tree/Dictys/yAL/~/.conda/envs/dictys/lib/python3.9/site-packages/dictys/traj.py#line=524), in point.fromdist(cls, traj, edges, dist)
    523     edges=np.array([traj.edgedict[tuple(x)][0] for x in edges])
    524 locs=dist[np.arange(len(edges)),traj.edges[edges,0]]
--> 525 return cls(traj,edges,locs,dist=dist)

File [~/.conda/envs/dictys/lib/python3.9/site-packages/dictys/traj.py:490](https://ondemand.carc.usc.edu/node/a02-09.hpc.usc.edu/56123/lab/tree/Dictys/yAL/~/.conda/envs/dictys/lib/python3.9/site-packages/dictys/traj.py#line=489), in point.__init__(self, traj, edges, locs, dist)
    488 self.edges=edges
    489 assert locs.shape==(self.npt,)
--> 490 locs=traj.conform_locs(locs,edges)
    491 self.locs=locs
    492 if dist is None:

File [~/.conda/envs/dictys/lib/python3.9/site-packages/dictys/traj.py:208](https://ondemand.carc.usc.edu/node/a02-09.hpc.usc.edu/56123/lab/tree/Dictys/yAL/~/.conda/envs/dictys/lib/python3.9/site-packages/dictys/traj.py#line=207), in trajectory.conform_locs(self, locs, edges, abs_err, rel_err)
    206 t1=np.nonzero((eabs>abs_err)&(erel>rel_err))[0]
    207 if len(t1)>0:
--> 208     raise ValueError(f'Absolute error {eabs[t1[0]]}>{abs_err} and relative error {erel[t1[0]]}>{rel_err} both exceed the given bounds.')
    209 locs=np.clip(locs,0,self.lens[edges])
    210 return locs

ValueError: Absolute error 3.7595970003856394e-07>1e-07 and relative error 2.8641798961666455e-07>1e-07 both exceed the given bounds.

I assume it has to do with an issue in my preparing of the files. I was wondering if you could point me in the right direction. Here's the code that I'm running for reference:

# Removes CPU usage limit by some jupyter versions
import os
os.environ['KMP_AFFINITY'] = ''
# Configure matplotlib to enable large animations
import matplotlib
matplotlib.rcParams['animation.embed_limit'] = 2**128
import matplotlib.pyplot as plt
# Prepare trajectory files
import pandas as pd
import dictys

# Load data
dist=pd.read_csv('/scratch1/kurella/Dictys/yAL/tmp/dist.tsv.gz',header=0,index_col=0,sep='\t')
edge=pd.read_csv('/scratch1/kurella/Dictys/yAL/tmp/edge.tsv.gz',header=None,index_col=None,sep='\t')
branch=pd.read_csv('/scratch1/kurella/Dictys/yAL/tmp/branch.tsv.gz',header=None,index_col=None,sep='\t')
# Save data
traj=dictys.traj.trajectory.fromdist(edge.values,dist.values)
traj.to_file('/scratch1/kurella/Dictys/yAL/data/traj_node.h5')
point=dictys.traj.point.fromdist(traj,branch.values,dist.values)
point.to_file('/scratch1/kurella/Dictys/yAL/data/traj_cell_rna.h5',traj=False)
lingfeiwang commented 7 months ago

Hi nikithkurella. That appears due to a strong validation check which I have just relaxed. Could you install the latest version and try again?

nikithkurella commented 7 months ago

Thank you so much @lingfeiwang! That seems to have resolved the issue.