Closed ShyamieG closed 1 year ago
Hi, Shyamalika! Let's see - I haven't got into the details here, but adding the filter_populations=False flag to both calls to .simplify( ) makes this error go away. (I haven't found if your script finishes running, as it started to make my computer run slowly).
For further debugging, if necessary I suggest putting in some calls like for p in ts.populations(): print(p)
to see what's going on with the populations.
-- peter
On Fri, Feb 10, 2023 at 1:27 PM Shyamalika Gopalan @.***> wrote:
Hi there! I'm trying to implement recapitation in my project, which involves merging tree sequences coming from parallel SLiM simulations. My overall aim is conceptually very similar to what is outlined in this tskit vignette.
I have been able to figure out how to generate a merged tree sequence object (thanks to a lot of help from @jeromekelleher and @petrelharp!), but now I am running into problems with recapitation.
I have created a minimal example derived from my simulation results that reproduces the error I am running into - please see attached for tree sequences and code. Specifically, I get a "ValueError: Duplicate population name: 'p2'" whenever I attempt to recapitate a merged tree.
I have tried re-assigning all my population IDs to be '0' (population IDs are meaningless for my application), but this does not fix the problem. I'm kind of stumped as to where to go from here. Any help would be appreciated!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Thanks for looking into this, Peter! Adding the filter_populations=False flag to .simplify() doesn't fix the issue for me. The error still occurs at the 'pyslim.recapitate()' line later on in the script.
Sorry that my code is lagging up your computer. Is it one of the lines in particular that is causing a problem?
Darn, okay. Well, I can have a look, but first: could you see if you can make a smaller example (e.g., make the sequence length much shorter, and see if the error still occurs?)? And, did you try doing print( )
of the populatoins, to see if the problem becomes obvious?
Okay, I produced a smaller example and attached it here.
I did try printing the the populations. Nothing obviously wrong there.
Thanks. But, hm, that script doesn't run for me, not even after I change it to say input_dir = "."
. Could you fix it up so it does run, quickly? A good way to make sure is to put all the files in their own directory - just like I would have them, starting from nothing, and test it yourself. Thanks.
To be clear, the error I get is
$ python3 test.py
Traceback (most recent call last):
File "/home/peter/debug/pyslim/shyamieg/test.py", line 13, in <module>
tree1_slim_ids = np.array([i.metadata['slim_id'] for i in tree1_sim.nodes()])
^^^^^^^^^
which I'm sure is an easy thing to fix, but it's really much better if you can send me an already-working script.
Sorry about that! See here - this should work once you change the input_dir
Ah, thanks, much better. Let's see:
First, the two tree sequences have the same populations:
>>> for p in tree1.populations():
... print(p)
...
Population(id=0, metadata={'name': 'p0', 'slim_id': 0})
Population(id=1, metadata={'name': 'p1', 'slim_id': 1})
Population(id=2, metadata={'name': 'p2', 'slim_id': 2})
Population(id=3, metadata={'name': 'p3', 'slim_id': 3})
Population(id=4, metadata={'name': 'p4', 'slim_id': 4})
Population(id=5, metadata={'name': 'p5', 'slim_id': 5})
Population(id=6, metadata={'name': 'p6', 'slim_id': 6})
Population(id=7, metadata={'name': 'p7', 'slim_id': 7})
Population(id=8, metadata={'name': 'p8', 'slim_id': 8})
Population(id=9, metadata={'name': 'p9', 'slim_id': 9})
>>> for p in tree2.populations():
... print(p)
...
Population(id=0, metadata={'name': 'p0', 'slim_id': 0})
Population(id=1, metadata={'name': 'p1', 'slim_id': 1})
Population(id=2, metadata={'name': 'p2', 'slim_id': 2})
Population(id=3, metadata={'name': 'p3', 'slim_id': 3})
Population(id=4, metadata={'name': 'p4', 'slim_id': 4})
Population(id=5, metadata={'name': 'p5', 'slim_id': 5})
Population(id=6, metadata={'name': 'p6', 'slim_id': 6})
Population(id=7, metadata={'name': 'p7', 'slim_id': 7})
Population(id=8, metadata={'name': 'p8', 'slim_id': 8})
Population(id=9, metadata={'name': 'p9', 'slim_id': 9})
Next, after you union
and then simplify
we get
>>> for p in merged_tree.populations():
... print(p)
...
Population(id=0, metadata={'name': 'p0', 'slim_id': 0})
Population(id=1, metadata={'name': 'p2', 'slim_id': 2})
Population(id=2, metadata={'name': 'p2', 'slim_id': 2})
Why'd it do this? Well, it's easier to see if we remove the .simplify()
:
>>> for p in merged_tree.populations():
... print(p)
...
Population(id=0, metadata={'name': 'p0', 'slim_id': 0})
Population(id=1, metadata={'name': 'p1', 'slim_id': 1})
Population(id=2, metadata={'name': 'p2', 'slim_id': 2})
Population(id=3, metadata={'name': 'p3', 'slim_id': 3})
Population(id=4, metadata={'name': 'p4', 'slim_id': 4})
Population(id=5, metadata={'name': 'p5', 'slim_id': 5})
Population(id=6, metadata={'name': 'p6', 'slim_id': 6})
Population(id=7, metadata={'name': 'p7', 'slim_id': 7})
Population(id=8, metadata={'name': 'p8', 'slim_id': 8})
Population(id=9, metadata={'name': 'p9', 'slim_id': 9})
Population(id=10, metadata={'name': 'p2', 'slim_id': 2})
Population(id=11, metadata={'name': 'p0', 'slim_id': 0})
Population(id=12, metadata={'name': 'p7', 'slim_id': 7})
What happened there is that union
by default adds new populations for any nodes in the second tree sequence, and copies over the metadata. I guess that only p2
and p0
and p7
had individuals in them in tree1
.
Now, you could disable this by passing add_populations=False
to union
:
merged_tree = tree2.union(tree1, node_map, check_shared_equality=False, add_populations=False)
... which gives a tree sequence that recapitates just fine. However, maybe that's not what you want? Is p2
in tree1
and p2
in tree2
the same population, or not?
If it is not what you want, then there's at least two ways to deal with it. One is to arrange your SLiM scripts so that they don't use the same subpop numbers. (So, after you simplify() you won't end up with duplicate IDs.) Another would be to edit the population table to change the names to not coincide.
Great, this fixes my issue! Thank you so much for walking me through it.
Hi there! I'm trying to implement recapitation in my project, which involves merging tree sequences coming from parallel SLiM simulations. My overall aim is conceptually very similar to what is outlined in this tskit vignette.
I have been able to figure out how to generate a merged tree sequence object (thanks to a lot of help from @jeromekelleher and @petrelharp!), but now I am running into problems with recapitation.
I have created a minimal example derived from my simulation results that reproduces the error I am running into - please see attached for tree sequences and code. Specifically, I get a "ValueError: Duplicate population name: 'p2'" whenever I attempt to recapitate a merged tree.
I have tried re-assigning all my population IDs to be '0' (population IDs are meaningless for my application), but this does not fix the problem. I'm kind of stumped as to where to go from here. Any help would be appreciated!