tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
170 stars 84 forks source link

no `record_provenance` argument to sim_mutations #2272

Open petrelharp opened 3 months ago

petrelharp commented 3 months ago

The following code has per-bp resolution on the RateMap:

L = 1e5
ts = msprime.sim_ancestry(1000, sequence_length=L, recombination_rate=1e-8, population_size=1e4)

rm = msprime.RateMap(
    position=np.linspace(0, L, int(L+1)),
    rate=rng.exponential(size=int(L))*1e-8
)
mts = msprime.sim_mutations(ts, rate=rm)

and produces the warning:

The provenance information for the resulting tree sequence is 3.08MB. This is nothing to worry about as provenance is a good thing to have, but if you want to save this memory/storage space you can disable provenance recording by setting record_provenance=False

However, this isn't possible, as sim_mutations doesn't have that option. This might be an issue in practice, for someone wanting to do chromosome-scale per-bp resolution (which should only take ~1hr with human-like parameters, so is not crazy!).