psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
55 stars 34 forks source link

Cannot simulate more than 50 sim-events #201

Closed fdtomasi closed 8 years ago

fdtomasi commented 8 years ago

The command

./bin/partis simulate --simulate-partially-from-scratch --outfname simu.csv --n-sim-events 50 --presto-output

runs without problem. However, using

./bin/partis simulate --simulate-partially-from-scratch --outfname simu.csv --n-sim-events 55 --presto-output

generates the following error:

simulating Process Process-1: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "./bin/partis", line 78, in make_events reco.combine(random_ints[ievt]) File "/home/fede/src/partis/python/recombinator.py", line 135, in combine failed = not self.try_to_combine(initial_irandom + itry) File "/home/fede/src/partis/python/recombinator.py", line 165, in try_to_combine self.add_mutants(reco_event, irandom) # toss a bunch of clones: add point mutations File "/home/fede/src/partis/python/recombinator.py", line 577, in add_mutants assert not utils.are_conserved_codons_screwed_up(reco_event) AssertionError Traceback (most recent call last): File "./bin/partis", line 369, in <module> args.func(args) File "./bin/partis", line 108, in run_simulation raise Exception('only found %d events (expected %d) in output file %s' % (n_events, n_per_proc, fname)) Exception: only found 24 events (expected 55) in output file /tmp/fede/hmms/989958/recombinator-0/simu.csv

I tried using more n-sim-events, but the problem persists, until I decrease the number of events to simulate to 50 or lower.

psathyrella commented 8 years ago

huh, nice catch! thanks.

It'll take me a bit longer to figure out exactly what's going wrong (the relevant error is the assertion failing about conserved codons), but perhaps I can fix the problem for your purposes sooner.

So --simulate-partially-from-scratch is a little bit in beta, I added it only recently while testing new-allele finding, and I don't have it in the general testing framework yet. Indeed if I use the parameters in test/ instead of this option, the error doesn't recur (I think it's not anything specific about the 55th event, but rather a somewhat uncommon occurence that doesn't happen until then. I can in fact only reliably get it to recur if I set --seed N). So, if you're only using --simulate-partially-from-scratch because you don't want to have to infer parameters from a data set first, I would recommend using the parameters in test/ instead, they should be fine, eg:

./bin/partis simulate --parameter-dir test/reference-results/test/parameters/data/hmm --outfname simu.csv --n-sim-events 500 --n-procs 10

I'll also switch the example simulation invocation in the manual to the test/ directory parameters to facilitate this use case.

Oh, also -- the presto output option unfortunately isn't going to do anything for simulation output. I could potentially add that, but I think it doesn't have specification for most of the simulation info, so it didn't seem super useful.