tskit-dev / pyslim

Tools for dealing with tree sequences coming to and from SLiM.
MIT License
27 stars 23 forks source link

Error with recapitation with non-uniform recombination map #248

Closed ChrystelleDelord closed 2 years ago

ChrystelleDelord commented 2 years ago

Dear all,

I want to simulate multiple chromosomes with SLiM (and pyslim recapitation). The reason I am doing this, rather than performing independent runs to mimic independant chromosomes, is that I need to keep the same individuals pedigree from SLiM.

So, I initiated my SLiM run in that way, and everything seemed to work fine and I am able to get my treeseq at the end.

initialize() {

    setSeed(getSeed());    
    initializeSLiMModelType("nonWF");
    initializeSLiMOptions(keepPedigrees = T);
    initializeTreeSeq();
    initializeSex("A");

    initializeMutationType("m1", 0.5, "f", 0.0); // codominant, neutral loci.
    initializeGenomicElementType("g1", m1, 1.0); // all genomic elements are of type m1.
    initializeGenomicElement(g1, 0, 199999999); // a moderately large "chromosome" of 0.2 Gb.
    initializeMutationRate(0.0);
    initializeRecombinationRate(c(1e-8, 0.5, 1e-8, 0.5, 1e-8, 0.5, 1e-8, 0.5, 1e-8), 
c(40000000, 40000001, 80000000, 80000001, 120000000, 120000001, 160000000, 160000001, 199999999)); 

However, when I load that tree within pyslim and try to launch recapitation, while accounting for the advice given in Recapitation with a non-uniform recombination map and also some documentation from msprime:

ts = pyslim.load("SLiM_test1pop.trees")
r_chrom = 1.0e-8
r_break = math.log(2)
chrom_positions = [0, 40000000, 80000000, 120000000, 160000000, 200000000]

map_positions = [chrom_positions[0], chrom_positions[1], chrom_positions[1] + 1, chrom_positions[2], 
    chrom_positions[2] + 1, chrom_positions[3], chrom_positions[3] + 1, chrom_positions[4], 
    chrom_positions[4] + 1, chrom_positions[5]]

rates = [r_chrom, r_break, r_chrom, r_break, r_chrom, r_break, r_chrom, r_break, r_chrom]
rate_map = msprime.RateMap(position=map_positions, rate=rates)

rts = ts.recapitate(recombination_rate=rate_map, Ne=22850, random_seed=1) 

I get the following error message:

Traceback (most recent call last):
  File "temp.py", line 52, in <module>
    rts = ts.recapitate(recombination_rate=rate_map, Ne=22850, random_seed=1)
  File "/home2/datawork/cdelord/conda-env/simupop-slim/lib/python3.7/site-packages/pyslim/slim_tree_sequence.py", line 470, in recapitate
    **kwargs)
  File "/home2/datawork/cdelord/conda-env/simupop-slim/lib/python3.7/site-packages/msprime/ancestry.py", line 615, in simulate
    random_seed=random_seed,
  File "/home2/datawork/cdelord/conda-env/simupop-slim/lib/python3.7/site-packages/msprime/ancestry.py", line 364, in _parse_simulate
    random_generator=random_generator,
  File "/home2/datawork/cdelord/conda-env/simupop-slim/lib/python3.7/site-packages/msprime/ancestry.py", line 1281, in __init__
    ll_recomb_map = self._resolve_missing_intervals(recombination_map)
  File "/home2/datawork/cdelord/conda-env/simupop-slim/lib/python3.7/site-packages/msprime/ancestry.py", line 1369, in _resolve_missing_intervals
    self.missing_intervals = recombination_map.missing_intervals()
  File "/home2/datawork/cdelord/conda-env/simupop-slim/lib/python3.7/site-packages/msprime/intervals.py", line 287, in missing_intervals
    out[:, 0] = self.left[self.missing]

**IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed**

I am most probably doing something wrong but am not sure how to fix this? My rate_map works fine if I use it on msprime with the StandardCoalescent (I know I am not supposed to do this though) and I get the same error message when using msprime with that rate_map and the DTWF model.

Is there a mismatch between the ways I am coding recombination maps between SLiM and pyslim?

Thank you very much;

Best regards, Chrys

petrelharp commented 2 years ago

Well, that's a confusing one. I have good news - (1) I get the same error, and (2) it can be avoided by doing instead

rts = pyslim.recapitate(ts, recombination_rate=rate_map, ancestral_Ne=22850, random_seed=1) 

Are you using the most recent pyslim version? You should have got a message suggesting that you use pyslim.recapitate instead of ts.recapitate( ) when you ran that code, if you were.

Ah-ha - and, I've found the bug. It comes because you've passed in rate_map using recombination_rate=rate_map; if instead you did recombination_map=rate_map then it would work as it should. (Now, what you did should work fine, or at least not have such a confusing error... but, that's where the problem came from - my code tried to treat the rate map as if it were a single number). But, I recommend you use pyslim.recapitate( ) as above.

I'm going to close this and open a new issue documenting the underlying cause, but feel free to re-open if it doesn't solve your problem.

ChrystelleDelord commented 2 years ago

Dear Peter,

Sorry for the delay and thank you so much for your answer! It does work, and I was indeed not using the last version of pyslim so this is probably why!

Have a wonderful week-end and thank you again!