soedinglab / prosstt

PRObabilistic Simulations of ScRNA-seq Tree-like Topologies
http://dx.doi.org/10.1093/bioinformatics/btz078
GNU General Public License v3.0
25 stars 11 forks source link

Simulating tree from newick: the parameters are not enough for [n] branches #6

Closed scottgigante closed 6 years ago

scottgigante commented 6 years ago

Hi,

Thanks for providing a great tool! I'm trying to generate a tree following your sample_pseudotime_series.ipynb and came across the following error.

>>> import prosstt.simulation as sim
>>> import prosstt.sim_utils as sut
>>> from prosstt import tree
>>> import numpy as np
>>> import newick
>>> np.random.seed(42)
>>> newick1 = "((A:170,B:170),(C:170,D:170))E:170;"
>>> tree1 = newick.loads(newick1)
>>> lineage = tree.Tree.from_newick(newick1, genes=1000)
>>> uMs, Ws, Hs = sim.simulate_lineage(lineage, a=0.05)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/old/home/scottgigante/sync/1_soft/prosstt/prosstt/simulation.py", line 252, in simulate_lineage
    tree.num_branches)
ValueError: the parameters are not enough for 7 branches

I'm pretty sure my Newick representation is correct as otherwise the newick package throws an error. Any ideas as to what I'm doing wrong?

Thanks again!

galicae commented 6 years ago

Hey Scott!

Thanks for taking the time to work with PROSSTT! I hope it can be useful for you!

Onto the issue: PROSSTT expects the topology input to be in terms of branches: "(A,B)C;" describes a single bifurcation, with branch A branching off to branches B and C. "(A,B)C;". Similarly, in "(B,(D,E,F)C)A;" branch A splits into branches B and C, and D, E, F sprout from C.

In your example tree, nothing connects A,B and C,D to E. If you want a topology where all of them come out of E then your Newick tree would be "(A,B,C,D)E;". However, it seems like you are interested in something where the pairs A,B and C,D are more similar to each other. You can get this by inserting (small) branches between E and the pairs:

"((A:170,B:170)F:10, (C:170,D:170)G:10)E:170;"

Using lengths of 5, 10 and 20 for the vestigial intermediate branches has created satisfactory results on my end. (e.g. see the first 2 components of a diffusion map with 5:)

download

Clearly for this error to arise there is lack of clarity somewhere; I will amend the documentation to better explain how to utilize the Newick tree input. Thanks again!

scottgigante commented 6 years ago

Thanks @galicae for the explanation. Sorry to trouble you with my own misunderstanding of newick!

galicae commented 6 years ago

Nah it probably means I did not explain this well enough. I will close the issue when I expand the documentation.

scottgigante commented 6 years ago

Thanks again!

galicae commented 6 years ago

Addressed in #8 :)