Closed jonathanstrong closed 3 years ago
~Did you have a nan in your data?~
Yeah, I think we could add nan checks.
There's nothing intrinsically wrong with NaN in inputs.
The error is arising when the normal log probability density function is being evaluated. The normal distribution throws an exception when it gets out-of-domain arguments, which is where the error message is coming from. The complication is that it doesn't know where in the Stan program the error arose---the error comes from the C++ code in the normal distribution. It would be much clearer if we were able to say found y to be NaN in y ~ normal(...)
rather than just reporting about the "variate" (the variate is the outcome y
in normal_lpdf(y | mu, sigma)
). At least it's better than it used to be in that it's telling you which entry in the vector is the problem.
It probably wouldn't be terrible to forbid NaN inputs, but that's not how CmdStan is going to operate, but that would be a PyStan-specific decision and may cause models that work in CmdStan, RStan, etc., to fail in PyStan.
I put it as a question because I wasn't sure if there were cases where NaN inputs to Stan are used (intentionally). One option would be for pystan to warn whenever there are NaN values present. Another would be to catch any exceptions from the c++ code and then check for NaN and include NaN-related warnings in the exception output. It seems unlikely that improving the c++ error message is the easiest route.
Thanks---those both sound like promising approaches.
The input warning would be more robust, because NaN errors can pop up other than from NaN input (e.g., 0 / 0 or inf - inf).
You're right that doing this from the C++ side would be nearly impossible---we'd need to thread calling information through to every function.
Summary:
The error message encountered when passing data inputs with NaN is confusing. Perhaps
optimizing
andsampling
should screen items indata
dictionary for NaN.Description:
Brand new to pystan, spent a few minutes struggling with this error, which occurred because I was accidentally passing input data with NaN values in it:
Since I was working in a jupyter notebook, the c++ stderr output was not visible. Eventually I noticed it included a message about NaN:
Reproducible Steps:
The model I was running is the linear regression example from the stan manual, nothing exotic. I simply passed data with NaN values in the
data
dictionary tooptimizing
andsampling
.Current Output:
Error messages quoted above.
Expected Output:
I would have appreciated a clearer error message. It might be worth considering screening inputs before passing them to c++.
PyStan Version:
2.19.0.0
Python Version:
3.7.4
Operating System:
ubuntu 18.04