Closed hyanwong closed 6 months ago
Default now set to 0 (I guess most species number from 1 anyway, so 0 is fine for indicating it's not been explicitly set).
We should check that specifying chromosome=1 (without defining a chromosome 0) works: no reason it shouldn't.
However, the sample-resolving etc algorithms will currently break for different chromosomes.
It should be possible to set the chromosome numbers to arbitrary values for different individuals, and the recombination-breakpoint-finding routines should "just work". This would be a good unit-test.
Details for chromosome metadata should I think, be stored in the node metadata, when we implement it. There is no point having a chromosome
table, as the identity of chromosomes can differ between individuals.
We will need to change the format of the MRCAdict to allow different intervals to exist on different chromosomes.
Done in #112
A relatively large task is to update the sample resolving and MRCA finding algorithms to account for different chromosomes (see #11 for the approach). At the moment we assume that an interval from e.g. 0...100 below a node can be intersected with an interval above the node e.g. from 50..200. This is only true if both intervals are on the same chromosome. We therefore need to keep a list of chromosome intervals (Portion objects) on the intervals stack, rather than just having a single Portion object per node. We should probably store this as a dictionary (keyed by numerical chromosome ID) rather than a list, because we can't be guaranteed that the chromosomes for a given node will be numbered from 0..N.
However, for the time being, we can raise an error if we have any chromosome numbers other than the default.
Additionally, I think it would be neater to default to a chromosome of
0
(not-1
). After all, even if we don't specify a chromosome, we are assuming there is one. I suppose-1
(or some other negative number) could be reserved for circular chromosomes.