tskit-dev / pyslim

Tools for dealing with tree sequences coming to and from SLiM.
MIT License
27 stars 23 forks source link

inconsistent time scaling after calling pyslim.recapitate #237

Closed bguo068 closed 2 years ago

bguo068 commented 2 years ago

I ran a single population simulation in SLiM for 50 generations, loaded the SLiM recorded tree sequence file into python and then call pyslim recapitate:

pyslim.recapitate(
            ts,
            ancestral_Ne=10000,
            recombination_rate=1e-8,
        )

I checked the node time:

print(rts.tables.nodes.asdict()["time"])

Got this:

[ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          4.          4.          7.         12.
 15.         16.         17.         18.         20.         21.
 22.         24.         27.         28.         30.         32.
 35.         36.         37.         40.         42.         42.
 43.         44.         46.         47.         47.         48.
 49.         50.         50.00001637 50.00002066 50.00006233 50.00012152
 50.00041158 50.00077951 50.00109969 50.00130631 50.00133778 50.00134906
 50.00140127 50.00143857 50.00167856 50.0017617  50.0018777  50.00195675
 50.0020626  50.00225843 50.0022893  50.00243993 50.0025325  50.00266928
 50.00268947 50.00268947 50.00288165 50.00295011 50.00299932 50.00301058
 50.00301645 50.00356345 50.00406507 50.00407395 50.00422114 50.00426909
 50.00437148 50.00461151 50.00478524 50.00487694 50.00494062 50.00506156
 50.00519119 50.00521884 50.00594187 50.00614015 50.00615692 50.00623169
 50.00628775 50.00630028 50.00637146 50.00639213 50.00645668 50.00647551
 50.00657743 50.00689228 50.00694585 50.0069992  50.00712222 50.0072448
 50.00739086 50.00739414 50.00740303 50.00745753 50.00758076 50.00761082
 50.00770078 50.00798917 50.00805746 50.00818361 50.00827513 50.00843496
 50.00903855 50.00905414 50.00937931 50.00952421 50.00969539 50.00975179
 50.00982734 50.00992221 50.00994229 50.00994659 50.01046637 50.01047231
 50.0106238  50.01068188 50.01069077 50.01069285 50.01085229 50.01099046
 50.01144449 50.01159246 50.01177294 50.01184561 50.0119888  50.01212718
 50.01278436 50.01292529 50.0130349  50.01307193 50.0131373  50.01335909
 50.01337778 50.01383442 50.01402908 50.01423222 50.01434217 50.01481985
 50.01491088 50.0150284  50.01516347 50.01553881 50.01568166 50.01575005
 50.01586601 50.01601611 50.01619131 50.01663963 50.0168219  50.01712658
 50.01763987 50.01793284 50.01842915 50.01849881 50.01909109 50.01912823
 50.01946128 50.01959938 50.01993157 50.02011184 50.02015592 50.02039582
 50.02114702 50.0216769  50.02182913 50.02279639 50.02353742 50.02357903
 50.02419849 50.02526019 50.02539834 50.02668418 50.02686462 50.02729487
 50.02744194 50.02807669 50.02854292 50.02908757 50.02920885 50.0298645
 50.03023676 50.03061088 50.03074028 50.0319307  50.03212929 50.03296114
 50.03353799 50.03562888 50.03616445 50.036669   50.03691441 50.03787982
 50.03808936 50.03865444 50.03915789 50.03947828 50.04006532 50.0402044
 50.04024169 50.04262987 50.04278239 50.04314322 50.04321034 50.04361096
 50.04368119 50.04584554 50.04605821 50.04614131 50.046619   50.0472433
 50.04771092 50.04852303 50.04857878 50.0486373  50.04958645 50.04984485
 50.050336   50.05183727 50.0523186  50.05473921 50.05550375 50.0564899
 50.05709385 50.05795522 50.05917377 50.0592618  50.06241752 50.06372589
 50.0650676  50.0660267  50.06626997 50.06735354 50.0718794  50.07421775
 50.08113379 50.08234341 50.08365522 50.0863844  50.09043272 50.09435184
 50.09838505 50.10305422 50.1059017  50.1063765  50.10778732 50.11007977
 50.11372959 50.1183893  50.12060663 50.12398019 50.12659339 50.12928407
 50.13024475 50.13683435 50.13957578 50.14120736 50.1432195  50.14736926
 50.17287065 50.17424898 50.20321501 50.21450967 50.22258092 50.2594858
 50.29862733 50.31807737 50.32186997 50.3317265  50.33906212 50.44906933
 50.53869387 50.57350737 50.58090689 50.69313102 51.01249488 51.03030265
 51.04102833 51.07621684 51.24938244]

It seems the time scaling is not right for the ancestral part. All the time > 50 is clumped together. After a look at the source code of recapitate, I think the following line might be problematic.

https://github.com/tskit-dev/pyslim/blob/55d25fb9021d3d475842b11a97d8fe238e34974d/pyslim/methods.py#L51-L52

I made a few edits:

        demography = msprime.Demography.from_tree_sequence(ts)
        # must set pop sizes to >0 even though we merge immediately
        demography.populations[0].initial_size = 0.0                                   # <---------------------------------------------
        demography.populations[1].initial_size = self.N                               # <---------------------------------------------
        ancestral_name = "ancestral"
        derived_names = [pop.name for pop in demography.populations]  # <---------------------------------------------
        ancestral_metadata = {}
        ancestral_metadata["slim_id"] = ts.num_populations
        demography.add_population(
            name=ancestral_name,
            description="ancestral population simulated by msprime",
            initial_size=self.N,
            extra_metadata=ancestral_metadata,
        )
        # the split has to come slightly longer ago than slim_generation,
        # since that's when all the linages are at, and otherwise the event
        # won't apply to them
        demography.add_population_split(
            np.nextafter(
                ts.metadata["SLiM"]["generation"],
                2 * ts.metadata["SLiM"]["generation"],
            ),
            derived=derived_names,
            ancestral=ancestral_name,
        )
        kwargs = {}
        kwargs["demography"] = demography

        recap = msprime.sim_ancestry(initial_state=ts, **kwargs)

        rts = pyslim.SlimTreeSequence(recap)

Now it seems to work, but I am not that sure, could you take a look at it? Thanks

['0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '4.00', '8.00', '10.00', '12.00', '16.00', '17.00', '17.00', '18.00', '19.00', '21.00', '25.00', '25.00', '28.00', '28.00', '32.00', '33.00', '33.00', '34.00', '34.00', '34.00', '34.00', '36.00', '38.00', '39.00', '40.00', '40.00', '40.00', '41.00', '42.00', '44.00', '45.00', '45.00', '46.00', '47.00', '47.00', '48.00', '48.00', '48.00', '50.00', '50.87', '52.48', '52.49', '52.94', '53.14', '54.42', '54.63', '56.23', '57.95', '58.04', '58.82', '59.78', '60.12', '61.00', '63.20', '65.11', '67.66', '68.32', '68.92', '70.12', '71.47', '71.68', '71.74', '71.84', '72.27', '72.48', '73.66', '73.98', '74.14', '74.19', '74.33', '75.34', '76.28', '79.06', '79.40', '85.81', '86.44', '89.92', '91.37', '92.83', '93.25', '93.42', '94.72', '95.39', '95.82', '100.21', '103.50', '104.65', '104.72', '111.90', '115.20', '120.73', '125.52', '125.60', '125.60', '128.17', '128.43', '128.95', '130.13', '136.05', '137.96', '140.05', '142.28', '145.48', '146.49', '147.92', '148.97', '159.20', '159.72', '159.96', '161.00', '161.11', '162.93', '164.54', '166.26', '166.54', '168.41', '172.51', '172.85', '173.48', '183.11', '186.94', '188.53', '197.69', '198.67', '203.37', '206.66', '209.90', '210.35', '210.97', '215.13', '219.55', '220.77', '220.81', '229.24', '229.52', '229.87', '230.16', '232.20', '233.35', '235.89', '236.20', '244.40', '245.26', '247.13', '248.03', '248.38', '250.17', '256.49', '257.62', '258.88', '259.91', '265.78', '269.52', '272.16', '277.83', '280.28', '281.56', '281.59', '287.15', '288.67', '290.76', '296.95', '299.41', '301.14', '302.57', '308.88', '315.13', '316.88', '326.78', '339.53', '340.19', '352.19', '352.68', '359.31', '362.46', '369.03', '375.02', '380.47', '386.69', '388.39', '391.30', '410.11', '415.45', '417.65', '433.00', '436.81', '445.71', '446.92', '449.10', '449.68', '451.37', '457.28', '463.89', '474.39', '475.59', '497.38', '501.71', '518.58', '540.02', '541.72', '543.85', '549.66', '584.75', '611.59', '619.90', '623.09', '625.53', '629.80', '675.42', '682.23', '713.30', '735.15', '739.51', '741.03', '751.65', '757.87', '790.69', '806.24', '820.53', '872.40', '880.81', '906.25', '981.63', '1031.87', '1032.16', '1039.49', '1077.26', '1148.69', '1158.81', '1159.13', '1200.83', '1226.88', '1261.56', '1288.65', '1352.12', '1374.41', '1542.16', '1552.19', '1552.63', '1568.27', '1721.66', '1776.91', '2102.17', '2155.66', '2179.13', '2218.34', '2239.92', '2266.04', '2524.41', '2607.04', '2931.60', '3267.68', '4027.73', '5384.97', '5960.48', '6810.03', '8427.10', '9113.67', '9147.31', '17514.41', '29074.65']
bguo068 commented 2 years ago

Sorry, it could be a software version issue. I can't replicate the issue in a new conda environment.