Closed ivan-krukov closed 4 years ago
Something else related to this point: The tree sequences currently have haplotypes, so if we have 4 probands in the pedigree, the tree sequence will have 8 leaves. Not sure how to proceed with this...
Not sure if this is helpful, but when working with the TS of Genizon data, I just had a dictionary of node ID pointing to sample IDs (where two node IDs point to the same sample IDs) For slicing a tree or accessing samples from the tree, I flipped the dictionary to have sample IDs pointing to node IDs.
How did you generate the dictionary, @LukeAndersonTrocme?
#set dictionary for each tree
node_keys = {}
#loop through each chrom
for c in chroms:
#set dict for this chrom
node_key = {}
id_key = defaultdict(list)
#loop through each node
for n in chroms[c].nodes():
#make sure it's a sample
if n.is_sample():
#get individual index
ind_index = n.individual
#get individual ID (VCF NAME)
ind_meta = chroms[c].individual(ind_index).metadata
ind_id = json.loads(ind_meta.decode('utf-8'))['individual_id']
#add individual ID to node index
node_key[n.id] = ind_id
node_keys[c] = node_key
#FLIP THE DICT for each ID, add node index
for key, value in node_key.items():
id_key[value].append(key)
id_keys[c] = id_key
It's a bit of a hack but it works. Let me know if you need more context to this, for the metadata I had to add this myself, wasn't too bad, just modified the convert.py file from somewhere in the tskit docs.
Thanks to what Luke pointed out about metadata, I was able to solve this problem in the following way:
ts_nodes_to_ped_map = {}
for n in sim.nodes():
if n.individual != -1:
ind_info = sim.individual(n.individual)
ts_nodes_to_ped_map[n.id] = int(ind_info.metadata.decode())
I updated the Pedigree
class to return this mapping along with the simulations themselves (here). I also updated the method to allow users to optionally convert from TreeSequence
to Traversal
objects. Could be useful for plotting and other functionality.
Not sure this is the most appropriate place for those additions, but it should work for now.
msprime
uses the minimal integers to label the nodes. We use the IDs provided in the genealogy to establish node identity. To start the alignment, we need to match our IDs withmsprime
IDs