tskit-dev / tskit

Population-scale genomics
MIT License
155 stars 73 forks source link

assigning to `ts.tables.metadata` does not produce an error message #1906

Open petrelharp opened 3 years ago

petrelharp commented 3 years ago

For instance:

>>> ts = msprime.simulate(10)
>>> ts.tables.metadata = b'abc'
>>> ts.tables.metadata
b''

>>> ts.tables.metadata_schema

>>> basic_schema = tskit.MetadataSchema({'codec': 'json'})
>>> ts.tables.metadata_schema = basic_schema
>>> ts.tables.metadata_schema

These are immutable, so probably ought to throw an error. This caught @vsbuffalo just now.

benjeffery commented 3 years ago

Hmm, they are not immutable.

>>> ts.tables.metadata = b'abc' is creating a table collection from the tree sequence, then mutating the metadata on that.

>>> ts.tables.metadata is then re-creating the table collection again from the tree sequence.

How and where can we throw an error??

jeromekelleher commented 3 years ago

The tables we return from ts.tables should be immutable though - we want this to be a view on the underlying tables eventually. That's the distinction we want to make with ts.dump_tables() (mutable copy) and ts.tables (immutable view). Not sure how practical this is right now, but that's where we'd like to end up.

benjeffery commented 3 years ago

Ah, I thought this was a different suggestion to https://github.com/tskit-dev/tskit/issues/760

jeromekelleher commented 3 years ago

Aha, we've thought about this before then! So, I think the answer then is that "this is unfortunate but there's no much we can do about it" because actually marking stuff as read-only is quite a lot of work.

petrelharp commented 3 years ago

Ah, I see. Any other suggestions here, @vsbuffalo?

vsbuffalo commented 3 years ago

Hmm, I think I understand now (and after some helpful chats with @petrelharp see why we can't just overload the setters). If I understand correctly, the tables can either be attached or detached to the TS object, and are immutable if they are attached, but mutable if they aren't. Is it possible to store some state about whether they're detached or not in the table object that is then changed when they're a part of a tree sequence object (and then immutable)? Then that state can issue warnings through setter functions if they're attached and users are using the setters?

jeromekelleher commented 3 years ago

Something like this would work I think @vsbuffalo, but the reason we haven't taken it on up to now is that it's only marginally less work than adding a READ_ONLY flag to the C tables and raising errors when updates are attempted. Then we could return a true read-only view of the TableCollection from the ts.tables attr.

If there's some quick Python level workarounds using a bit of attribute magic we can put in in the mean time though, I think that would still be very useful and would reduce confusion a lot. So, PRs welcome, basically! (Even if you just want to explore an idea.)

Related: #993, #760, #53