Closed mmfausnaugh closed 2 years ago
Hi,
While it is perfectly possible that the is a regression in 1.2.3, ATM I am not sure it is clear from what you show.
It is hard to tell from the distribution of weights, which one is right which is wrong. Do you have some evidence demonstrating that the new one is ? It is also not 100% clear to me which weights are you talking about here -- some weights-parameters in your model or dynesty sampling weights ?
There could be an issue with plotting as well, as plotting is much less rigorously tested.
I am able to run your test code, but because it's doing multiple fits, it's unclear what I should look at. If you can isolate one fit for one specific dataset, where you believe the results are incorrect wrt 0.9.5. that'd be helpful.
Regarding the lower efficiency, it's ATM hard to tell. There were correctness fixes in the multi-modal case, which made the sampling potentially less efficient. Whether that's the cause of problems or not is hard to tell. I would first investigate the question, whether ther results are incorrect. And then depending on the result, we can look into efficiency.
Basically, from my point of view in the case of different result I would trust 1.2.3 results over 0.9.5, but I'm happy to see a test case that would show that 1.2.3 is problematic, so we can fix it.
Thanks for your attention on this!
-I am talking about the sampling weights. I had an error in one of my plots and they look more similar in the simulation---I've deleted those plots.
-I added .npz files with dynesty outputs to the debug repo that more clearly show my issue. These chains were derived from real data. The files are 2022exc_dynesty_py36.npz and 2022exc_dynest_py310.npz.
The main differences that I see are: --With the v0.9/py3.6, there are 42414 samples. With the 1.2.3/py310 version, there are 511 samples. --WIth v0.9, the max logz is 6643.2. With v1.2.3, max logz is -17557.7. --Runs with v1.2.3 fall well below 1% efficiency and takes longer to run. --Plotting the sampling weights shows a different distribution, though I'm not sure how to interpret these. FWIW, the differences are not as large when using simulated data, so I need to dig into this more (sorry about the previous message).
I will work today to add code that makes the results from real data reproducible. (There are several extra modules that run on real data and it will take some thought about how to push this to the debug repo.)
From what you describe the concerning thing is the max(logz). Those should be similar if both fits converged. The most likely reason for the discrepancy (I'm guessing here) is that one of the posterior modes is missed. if that's the case, that would also be stochastic, different from run/run (depend on the random seed). IF the posterior mode is more frequently missed in 1.2.3 than 0.9.5, that's the problem indeed.
Just checking, if there is still an issue @mmfausnaugh or we can close this.
I'll close the issue for now. Feel free to reopen if the issue is still there.
Sorry that it has taken so long for me to come back to this!
My issue is persisting, even with v2.1.1.
I've updated my test example with data that more clearly shows the issue: https://github.com/mmfausnaugh/SN_analysis_debug/tree/main
To run the test, you would clone the directory and do python analyze_SN_curved_PL_dnest.py
There are two tests, one for lightcurves of an object called 2020tld, and one with an object called 2022exc. Tests for 2020tld work roughly as expected. 2022exc disagrees pretty strongly between the two versions. In fact, I cannot plot the results from v2.1.1 because the assigned weights are all < 1.e-20, except for the last entry which is 1.0 .
In both cases, the acceptance fraction is small when using dynesty v2.1.1, less than 1%. on 0.9.5.3, the acceptance fraction is about 20% for both cases, and the posteriors look about the way I expect.
I don't think this is a matter of stochasticity. I am seeding the random number generator, so I would expect to get the same results using both versions of dynesty. For 2020tld, the posteriors are slightly different, but consistent with each other (the accpetance fractiond oes however make 2020tld run much slower).
For 202exc, the output weights demonstrates the specific problem I was having.
Please let me know if you have any questions that I can clarify or there are issues with running my example.
Thanks, --Michael
Describe the bug I recently upgraded my laptop and moved from dynesty 0.9.5.3 to 1.2.3. I'm finding different results using dynesty on the same data after the change---in particular, some runs result in weights equal to or less than zero, which invalidates some of the parameter estimations and causses trace/corner plots to fail.
Setup I am running version 1.2.3 and 0.9.5.3, both installed with pip. The setup is quite complicated, but I've made a public repository that I think will reproduce the behavior: https://github.com/mmfausnaugh/SN_analysis_debug If you clone this repository and run
python sim_detection_limit.py
, it will make synthetic data, run dynesty on it, save the chains to .npz files, and visualize models with the medians of the posteriors in a .png file.Please let me know if you have issues running this code.
Bug There are no errors, so perhaps this is not strictly a bug.
(update: removed plots of sampling weights because of a math error)
The behavior I expected was consistent results between 1.2.3 and 0.9.5.3 (if not identical, since I am using seeds for the random numbers; but I'm not sure what the different processor architectures will do to that expectation).
To reproduce the behavior I see, please see https://github.com/mmfausnaugh/SN_analysis_debug.
When running my plotting code on some chains from real data, the weights go to zero and I get this error:
Additional context
Emailed to Josh Speagle:
I started my project some time ago and was using 0.9.5.3. Inspection of models from posterior draws shows that this run is quite good.
However, when using 1.2.3, I find very different posteriors, which seem to be "incorrect." When inspecting the weights, they seem very close to zero (plots are attached). In fact, if I try to make trace plots or corner plots with the updated version, it throws an error saying that it cannot find any points--inspection seems to indicate that every point is in a single bin and this causes errors when plotting with contours.
Draws from these posteriors are plausible but obviously worse fits to the data compared to posteriors drawn from 0.9.5.3. I also found large differences in the efficiency (which I take to be analogous to the acceptance fraction?) and log likelihood values at similar iterations (about iteration 4100 in what is below). Specific outputs are below, but I am seeing a factor of 10 worse efficiency and smaller log likelihood values (i.e., 10x more negative) on the version.
Some more details are below. I can confirm that I am using exactly the same code and data. I can say that previous results produced good parameters based on the posterior draws with recalculated weights np.exp(result['logwt'] - result['logz'][-1]).
I first noticed this when moving my analysis to a new Mac Book Pro with M1 processors; however, I tried a new run on the intel machine with python 3.10, numpy 1.22.3, and dynasty 1.2.3, and ended up with identical results.
Any advice you can give would be most appreciated. The main goal frommy perspective is to reproduce the old behavior from py3.6/dynesty 0.9.5.3 on py310/dynesty1.2.3.
macOS 12.4, M1 chip python 3.10 dynesty 1.2.3 numpy 1.22.3 output from dynesty: 4165it [02:37, 26.50it/s, batch: 0 | bound: 323 | nc: 395 | ncall: 305404 | eff(%): 1.362 | loglstar: -inf < -15266.643 < inf | logz: -15281.873 +/- 0.174 | dlogz: 20115.679 > 0.010]
macOS 12.4, intel chip python 3.6.8 dynesty 0.9.5.3 numpy 1.16.2 output from dynesty: iter: 4145 | eff(%): 14.947 | logl*: -inf<-12385.6< inf | logz: -12400.8+/-0.2 | dlogz: 15312.2> 0.0
When running 1.2.3 on the intel chip, it stays at 15% for longer, but eventually plummets to 2% and even worse.