mufernando / graft

0 stars 0 forks source link

test of grafting not working #7

Closed mufernando closed 4 years ago

mufernando commented 4 years ago

Test of the graft function is not working properly. Fails at this line.

I know the issue is that individual ids and population ids are different between the pre-graft and post-graft treeseqs. The issue is that the mapping of individual IDs seems off by one. I build the mapping starting on this line. Can you help me figure this out, @petrelharp ?

petrelharp commented 4 years ago

The first thing I did is to make the example much smaller, so it's easy to look at, and insert some print statements in the test - see the branch I just pushed. What I'm seeing is that for some reason the first four individual metadata entries differ: they have got a single character added at the start. Do you know where that came from???

mufernando commented 4 years ago

nice!

no idea where that is added. If those are the individuals that are added onto ts1, then the only time I mess with the metadata is here: https://github.com/mufernando/graft/blob/master/graft/__init__.py#L139

But I never change anything about it. (not intentionally)

mufernando commented 4 years ago

btw just noticed that should be greater than or equal to

mufernando commented 4 years ago

Just pushed sth new to debug_misc. Those first four individuals are the ones that are added.

petrelharp commented 4 years ago

Hm, so they are. I added another print statement; the problem is not in constructing the individual table in graft, but elsewhere.

mufernando commented 4 years ago

well, so they are added by slim?

petrelharp commented 4 years ago

There is something bad happening. Let me track it down.

petrelharp commented 4 years ago

Ok, this was one wierd thing that was going on: https://github.com/tskit-dev/tskit/pull/577

petrelharp commented 4 years ago

Ok, I've refactored the test somewhat - now we are testing if both ts1 and ts2 are "unchanged" by grafting, where 'unchanged' means that if you simplify down to the samples then you get the same thing. Is this test getting everything that you had previously?

And, parsing the metadata reveals that the mismatch in metadata is due to the pedigree_ids being different, which is happening because the individuals are in different orders, as verified by these asserts. I thought that simplify would put them in the same order, but I guess not; and maybe you already knew this. One solution would be to use the individual map to compare individuals, properly remapped, almost as I do now, then clear the individual tables also and compare the remaining tables. Can you think of a cleaner solution?

mufernando commented 4 years ago

yes, I think your refactoring caught everything I had before.

I ended up 'manually' comparing the nodes, individuals and population tables. Not the cleanest, but it should work. Check it out.

petrelharp commented 4 years ago

Well, one way to make this simpler is that you can check equality of table rows (or, will be able to once this PR goes in).

mufernando commented 4 years ago

I also can't compare NodeRows, btw.

mufernando commented 4 years ago

or any other table rows, I think.

petrelharp commented 4 years ago

Show me the code, and the resulting error?

petrelharp commented 4 years ago

I think that means you're on an old version of tskit.

mufernando commented 4 years ago

ok, just got the github version of tskit and I can compare table rows!

mufernando commented 4 years ago

Done f490a7dddf4865c30658b71d60a50904da9cba20