Closed TymekPieszko closed 2 months ago
That's interesting - I didn't realise people were using edge metadata. Out of curiosity, can you show us the edge table to see what's in there @TymekPieszko?
tables = ts.dump_tables()
print(tables.edges)
Is there an easy way to drop the metadata from a table @benjeffery?
Yes, looks as below:
╔═════════╤══════════════════╤══════════════════╤════════╤════════╤══════════════════════╗
║id │left │right │parent │child │metadata ║
╠═════════╪══════════════════╪══════════════════╪════════╪════════╪══════════════════════╣
║0 │184311562.50000000│184311895.00000000│36074489│ 352│b'183302894 185513958'║
║1 │184311562.50000000│184311895.00000000│36074489│ 353│b'183302894 185513958'║
║2 │ 73482179.50000000│ 73482487.50000000│18087521│ 263│ b'71825215 74354148'║
║3 │ 73482179.50000000│ 73482487.50000000│18087521│ 266│ b'71825215 74354148'║
...
I see, that's interesting - Relate is doing something useful with the metadata. Good for them!
Dropping the metadata is most easily done like this I think:
tables = ts.dump_tables()
edges = tables.edges.copy()
tables.edges.set_columns(left=edges.left, right=edges.right, parent=edges.parent, child=edges.child)
ts_no_edge_md = tables.tree_sequence()
Does this work for you?
Sorry, just seen this - yes, that is how I would drop the metadata too.
Yes, @jeromekelleher, this does prevent this particular error, thanks!
The edge metadata in question here are the "equivalent" edge in adjacent trees -- that is, Relate builds a bunch of marginal trees, and "links" edges across trees based on similarity of the subtending sample set. (More precisely what is stored are the IDs of the nodes subtended by the equivalent edges). Unless you're using that information, there's no need to keep it around in the tree sequence (and if you use the --compress flag in the conversion tool, the edges themselves will actually persist across multiple trees).
Just to say that the most recent tskit
release has drop_metadata()
functions on all tables (I think!)
Having converted the output of Relate to a tree sequence using relate_lib (https://github.com/leospeidel/relate_lib), I was trying to obtain a subsection of the full ts. I got the following error, suggesting a problem with edge metadata: