Methods to compare Individuals, Populations, and Samples with tskit equivalents

tskit-dev / tsinfer

Infer a tree sequence from genetic variation data.

GNU General Public License v3.0

56 stars 13 forks source link

I reckon I can do something like this

def exclude_id(attribute, value):
     return attribute.name != "id"

population_equivalent(sd_pop, ts_pop):
    d1 = {k: v for k, v in ts.population(ts_pop).__dict__.items() if k != 'id'}
    d2 = {k: json.dumps(v).encode() if k == 'metadata' else v
        for k, v in attr.asdict(sd.population(sd_pop), filter=exclude_id)}
    return d1 == d2

The only issue is where there are attributes in the sample file that are not in the tree sequence, or vice versa. In particular, I'm thinking about the individuals_time value for sample data files, which has no equivalent in an individual in a tree sequence, until #322 (and after that will require special treatment)

tskit-dev / tsinfer

Methods to compare Individuals, Populations, and Samples with tskit equivalents #325