tskit-dev / pyslim

Tools for dealing with tree sequences coming to and from SLiM.
MIT License
27 stars 23 forks source link

Unpickling TreeSequence from pyslim doesn't initialize metadata, and raises exception #180

Closed vsbuffalo closed 3 years ago

vsbuffalo commented 3 years ago

I've run into an unexpected exception with tskit 0.3.6 — a description and MRE are below. The zip file tskit_mre.zip contains a SLiM simulation routine based on recipe 17.1 and an example tree sequence with metadata. Loading the tree using pyslim and trying to pickle and unpickle it as below,

import pyslim as psl
import tskit as tsk
import pickle

[x.__version__ for x in (psl, tsk)]
# ['0.600', '0.3.6']

tr = psl.load('./test.tree')
with open('test.pkl', 'wb') as f:
  pickle.dump(tr, file=f)

with open('test.pkl', 'rb') as f:
   tr2 = pickle.load(file=f)

leads to the following error,

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-dc9156f6d230> in <module>
      1 with open('../data/slim_sims/test.pkl', 'rb') as f:
----> 2     t = pickle.load(f)

~/miniconda3/envs/bioinfo/lib/python3.7/site-packages/tskit/trees.py in __setstate__(self, tc)
   3435 
   3436     def __setstate__(self, tc):
-> 3437         self.__init__(tc.tree_sequence().ll_tree_sequence)
   3438 
   3439     def __eq__(self, other):

~/miniconda3/envs/bioinfo/lib/python3.7/site-packages/pyslim/slim_tree_sequence.py in __init__(self, ts, reference_sequence, legacy_metadata)
    155     def __init__(self, ts, reference_sequence=None, legacy_metadata=False):
    156         self.legacy_metadata = legacy_metadata
--> 157         if not (isinstance(ts.metadata, dict) and 'SLiM' in ts.metadata
    158                 and ts.metadata['SLiM']['file_version'] in compatible_slim_file_versions):
    159             tables = ts.dump_tables()

AttributeError: '_tskit.TreeSequence' object has no attribute 'metadata'

This is with tskit 0.3.6 — on a different machine with 0.3.5, this works fine, so I suspect this is a regression from this change in how pickled tree sequences are initialized: https://github.com/tskit-dev/tskit/pull/1298/files#diff-ecbb802115b51a593926a3015f4ff2ec444fa4dc4915af74f4959abc60c84df8L3318. I would submit a PR, but I'm not sure the best way to initialize the object in a way that preserves the metadata.

jeromekelleher commented 3 years ago

Thanks for the bug report @vsbuffalo, we'll get this sorted for the next release (which I think we want to do quite soon).

I wonder if this is some tricky interaction with pyslim and how it's subclassing tskit.TreeSequence though?

benjeffery commented 3 years ago

Thanks for the report, I'll look into this now.

benjeffery commented 3 years ago

This is due to pyslim.SlimTreeSequence having a differing __init__ signature to tskit.TreeSequence. I've fixed this by hardcoding the __init__ reference in __setstate__ in tskit-dev/tskit#1556 which seems the right way to fix.

benjeffery commented 3 years ago

On further thinking we realised that this is better fixed in pyslim so have transferred the issue.