Open jeromekelleher opened 1 day ago
I mainly flagged it because it is actually a sampled genome, albeit from a different dataset, but I don't feel particularly strongly either way.
Maybe a more general solution would be to add strains from different datasets as separate "populations", so that we can do e.g. ts.samples(population=1) to get the viridian samples? That would also allow people to add other non-viridian samples at a later date, should they want to add to the ARG?
I'm going to remove it as from a practical perspective it's a pain, and the semantic purity of what we mean by a sample isn't that important.
I'm finding myself doing quite a lot of special casing to deal with having the reference being marked as a sample, in the same way as all the actual viridian samples. I really don't see what the advantage of marking it like this is - getting the reference sequence is trivial.
You initially flagged this in #152 @hyanwong - do you have any objections to switching this back?
The other alternative is to add the reference sequence to the Viridian dataset, but this would require more explanation all round I feel.