Unable to reproduce Tutorial3

rcrehuet commented 10 years ago

I am trying to reproduce Tutorial 3 and I get different results. When I evaluate the cell in Grid Search from: http://nbviewer.ipython.org/github/kyleabeauchamp/FitEnsemble/blob/master/tutorial/Tutorial3.ipynb I get a different output. Could it be a different pymc version? I am using version 2.3.2. My sampling seems to include the burnin process (20000 iterations instead of 25000) and it runs 2 runs for each value of regularization_strength:

Building model for 0 round of cross validation.
 [-----------------100%-----------------] 20000 of 20000 complete in 455.7 secBuilding model for 1 round of cross validation.
[-----------------100%-----------------] 20000 of 20000 complete in 453.8 secBuilding model for 0 round of cross validation.
[-----------------100%-----------------] 20000 of 20000 complete in 458.1 secBuilding model for 1 round of cross validation.
[-----------------100%-----------------] 20000 of 20000 complete in 459.4 secBuilding model for 0 round of cross validation.
[-----------------100%-----------------] 20000 of 20000 complete in 463.2 secBuilding model for 1 round of cross validation.
[-----------------100%-----------------] 20000 of 20000 complete in 454.4 sec

But the problem is the results are also different and result in a different interpretation: the chi^2 doesn't have a minimum at 3.0:

       1.0         3.0         5.0
χ2     1.054075    1.03282     0.981851

I cannot find which version of fitensemble I'm running, but I guess there is only one public release.

kyleabeauchamp commented 10 years ago

I'm testing now with the latest pymc (2.3.4), will look into this.

I will warn that for your own problem, it's not guaranteed that this sort of cross validation will be useful and lead to a well-defined minimum. The issue is that we have both simulation data (coordinates) and experimental data, and our regularization tunes our "relative faith" in both of these. I think a lot of the difficulty comes from the fact that our model doesn't consider uncertainty in conformations / populations--the only source of uncertainty is the experimental inputs. This can occassionally lead to issues, so it's worth thinking about as a potential limitation.

kyleabeauchamp commented 10 years ago

OK, I've reproduced your issue. I still have no idea what's going on, as it appears that something in one of the python packages is causing us to get slightly different results...

kyleabeauchamp commented 10 years ago

Yeah, I will probably have to brainstorm to come up with a better example for this tutorial. It's very strange that the output of this Notebook changed, as the FitEnsemble code hasn't changed a bit.

But back to the point: I believe the results you are seeing are characteristic of cases where cross validation won't be useful. The issue is that the input ensemble already agrees with the experimental values pretty well, so the best possible thing to do is to force alpha = 0 and simply return your input ensemble.

Another thing to note is that the chi squared is dropping under 1.0. This sometimes indicates that we're near the regime where we're limited by the uncertainty in our measurements.

rcrehuet commented 10 years ago

Thanks Kyle. Just to make things clear: I'm running the Tutorials with the data from the tutorials, not mine. I think the chi^2 is close enough to 1. Surely below that we would be fitting the experimental noise. This relates to an issue from Tutorial 1. The chi^2 obtained is very small:

Reduced chi squared of BELT: 0.020964

(I was refering to that in a previous issue, but now I'd like to raise a different question). In the Tutorial 3, if I also print the chi^2 for the training set:

all_train_chi = np.zeros(3)
for k, regularization_strength in enumerate(regularization_strength_list):
    model_factory = lambda predictions, measurements, uncertainties: belt.MaxEntBELT(predictions, measurements, uncertainties, regularization_strength)
    train_chi, test_chi = belt.cross_validated_mcmc(predictions, measurements, uncertainties, model_factory, bootstrap_index_list, num_samples=num_samples, thin=thin)
    all_test_chi[k] = test_chi.mean()
    all_train_chi[k] = train_chi.mean()

Then the chis from the training set are slightly lower than those of the test set (as expected) but very close to 1. But the fit to the training set is the same procedure carried out in Tutorial 1, where the chi^2 is two orders of magnitude smaller. Why is that so?

kyleabeauchamp / FitEnsemble

Unable to reproduce Tutorial3 #8