undark-lab / swyft

A system for scientific simulation-based inference at scale.
Other
154 stars 13 forks source link

Truncation - Store does not contain enough samples #98

Closed samgagnon closed 2 years ago

samgagnon commented 2 years ago

In the Truncation tutorial notebook (revision 50aa524e), I have made the following modification to cell 5 with the rest of the notebook held the same:

# def model(v, sigma = 0.01):
#     x = v + np.random.randn(n_parameters)*sigma
#     return {observation_key: x}

def model(v, sigma = 0.01):
    if v.sum() > 0.5:
        x = v*0.0
    else:
        x = np.abs(v)*0.9
    return {observation_key: x}

All cells until cell 10 run without error, but cell 10 throws

--------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-29-ec187b83f951> in <module>
      1 store.add(n_training_samples, prior, bound=bound)
      2 store.simulate()
----> 3 dataset = swyft.Dataset(n_training_samples, prior, store, bound = bound)

~\Anaconda3\envs\kn\lib\site-packages\swyft\store\dataset.py in __init__(self, N, prior, store, bound, simhook, simkeys)
     77         super().__init__()
     78         self._prior_truncator = swyft.PriorTruncator(prior, bound)
---> 79         self.indices = store.sample(N, prior, bound=bound)
     80         self._store = store
     81         self._simhook = simhook

~\Anaconda3\envs\kn\lib\site-packages\swyft\store\store.py in sample(self, N, prior, bound, check_coverage, add)
    324         if check_coverage:
    325             if self.coverage(N, prior, bound=bound) < 1.0:
--> 326                 raise RuntimeError(
    327                     "Store does not contain enough samples for your requested intensity function `N * prior`."
    328                 )

RuntimeError: Store does not contain enough samples for your requested intensity function `N * prior`.

I'm not sure if this is due to a failure to add samples to storage, or due to the bound being too constraining. If the latter, is there some way to sample points only within the bound without reducing the number of generated samples?

bkmi commented 2 years ago

HI there! Sorry for the slow reply.

The reason this happens is because the there is stochasticity in the number of samples generated within the store along with how many sampled requested when you call n_training_samples in dataset. It can happen that the dataset asks for more samples than what are in the store. If you add simulations to the store it should work. I will update the notebook so it does this automatically.

bkmi commented 2 years ago

I just rewrote notebook 6 and got it working locally. There was an issue where the file it referenced raised an error, this causes an extreme slowdown with dask...

I don't think it addresses all of your issues though.

samgagnon commented 2 years ago

I was able to resolve the issue (at least temporarily) by adding more samples as you suggested. I'll take a look at the updated notebook since I've been running into some issues with dask, but that's probably a different issue entirely. Thanks!

bkmi commented 2 years ago

Glad you found the workaround. Please feel free to make another issue about dask. We're back from break so it should be answered more quickly.