Open tmolteno opened 4 months ago
]
:tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.
The code in error appears to be in the generation of inferenceData, if I modify to set return_inferencedata=False, i.e.,
trace = pm.sample_smc(draws=2000, kernel=pm.smc.kernels.IMH,
chains=6, threshold=0.6,
correlation_threshold=0.01,
random_seed=rng, return_inferencedata=False, progressbar=False)
lml = trace.report.log_marginal_likelihood
Then the problem goes away.
Describe the issue:
Repeated calls to pm.sample_smc() will sometimes generate log_marginal_likelihood structures with wrong dimensions. This applies to both MH and IMH kernels. Here are two outputs from the code below
The first log_marginal_likelihood results show the problem. The xArray output is very confused -- the output is an array of lists, notice that the first list has too few elements and the co-ordinates are wrong (it reports 1 chain and 6 draws, when the sampling had 6 chains). The second example is correct, and the chain/draw coordinates are correct.
Reproduceable code example:
Error message:
No response
PyMC version information:
pymc 5.10.4, running on Debian trixie and bookworm. Pymc installed using pip into a venv.
Context for the issue:
It appears that the nan padding appears to be padding one too few 'nan's, and this messes up the xArray so that it is now ragged (different rows have different sizes). Perhaps related to Issue #5263.
Note that the xArray coordinates are completely incorrect when this bug happens.
Changing the random sample seed modifies this behaviour. The sample code above has a random seed chosen to cause this bug on the first call. Typically in my use case it is roughly 10 percent failure rate.