stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.59k stars 370 forks source link

stan_csv_reader is sensitive (crashes) to super small floats in sampling data #2480

Open michael-prange opened 6 years ago

michael-prange commented 6 years ago

Summary:

stansummary crashes when run on a csv file that contains 1.5316e-322 in the accept_stat__ column of the csv file.

Description:

[Copied from comment below] The problem seems to be that the Boost lexical_cast function can only deal with normal double values; 1.6e-322 is too small to be represented as normal number---here are the double-precision numerical limits.

The Boost doc only suggests using std::stringstream for more control.

Reproducible Steps:

My cmdstan code runs without warnings or errors, but when I run stansummary on the resulting csv file I get the error:

prange% /usr/local/cmdstan-2.17.1/bin/stansummary hyper_samples_1.csv 
libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_lexical_cast> >: bad lexical cast: source type value could not be interpreted as target
Abort

In order to trace the source of the error, I proceeded to delete sample-output lines from the csv file until I found the following offending line:

-27.7811,1.5316e-322,0.0547959,1,1,0,37.0581,0.718519,-1.14652,0.133214,2.21245,0.938323,-1.35562,-0.222546,0.970578,-0.274166,-1.11156,0.119766,-0.132146,-0.022578,0.229995,-0.0303289,-0.23872,0.0410686,0.0588429,-4.76369e-05,-0.18922,-18.4997
When I change the number 1.5316e-322 to 1.5316e-3, stansummary works again.

Run stansummary of the following csv file. I edited it down to the offending sample-output line:

# stan_version_major = 2
# stan_version_minor = 17
# stan_version_patch = 1
# model = hyper_model
# method = sample (Default)
#   sample
#     num_samples = 1000 (Default)
#     num_warmup = 1000 (Default)
#     save_warmup = 0 (Default)
#     thin = 1 (Default)
#     adapt
#       engaged = 1 (Default)
#       gamma = 0.050000000000000003 (Default)
#       delta = 0.80000000000000004 (Default)
#       kappa = 0.75 (Default)
#       t0 = 10 (Default)
#       init_buffer = 75 (Default)
#       term_buffer = 50 (Default)
#       window = 25 (Default)
#     algorithm = hmc (Default)
#       hmc
#         engine = nuts (Default)
#           nuts
#             max_depth = 10 (Default)
#         metric = diag_e (Default)
#         metric_file =  (Default)
#         stepsize = 1 (Default)
#         stepsize_jitter = 1
# id = 1
# data
#   file = hyper_1.data.R
# init = hyper_1.init.R
# random
#   seed = 3959268712
# output
#   file = hyper_samples_1.csv
#   diagnostic_file =  (Default)
#   refresh = 100 (Default)
lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,z.1
# Adaptation terminated
# Step size = 0.0547959
# Diagonal elements of inverse mass matrix:
# 0.713329, 1.08275, 0.596216, 0.607337, 0.289338, 0.159566, 0.477, 0.425896, 0.603512, 2.59539
-27.7811,1.5316e-322,0.0547959,1,1,0,37.0581,0.718519
-25.3671,0.86028,0.0547959,4,22,1,30.5591,0.690161
# 
#  Elapsed Time: 6.95684 seconds (Warm-up)
#                2.52359 seconds (Sampling)
#                9.48043 seconds (Total)
# 

I added a second sampling-output line below the offending line so stansummary will work when 1.5316e-322 is changed to 1.5316e-3

Current Output:

prange% /usr/local/cmdstan-2.17.1/bin/stansummary hyper_samples_1.csv 
libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_lexical_cast> >: bad lexical cast: source type value could not be interpreted as target
Abort

Expected Output:

When the offending number in the sampling output is changed from 1.5316e-322 to 1.5316e-3, the output is

prange% /usr/local/cmdstan-2.17.1/bin/stansummary tmp
Inference for Stan model: hyper_model
1 chains: each with iter=(2); warmup=(0); thin=(1); 2 iterations saved.

Warmup took (7.0) seconds, 7.0 seconds total
Sampling took (2.5) seconds, 2.5 seconds total

                 Mean   MCSE  StdDev        5%    50%    95%  N_Eff  N_Eff/s    R_hat
lp__              -27    1.2     1.7  -2.8e+01    -25    -25    2.0     0.79      nan
accept_stat__    0.43   0.43    0.61   1.5e-03   0.86   0.86    2.0     0.79      nan
stepsize__      0.055   0.00    0.00   5.5e-02  0.055  0.055    2.0     0.79      nan
treedepth__       2.5    1.5     2.1   1.0e+00    4.0    4.0    2.0     0.79      nan
n_leapfrog__       12     10      15   1.0e+00     22     22    2.0     0.79      nan
divergent__      0.50   0.50    0.71   0.0e+00    1.0    1.0    2.0     0.79      nan
energy__           34    3.2     4.6   3.1e+01     37     37    2.0     0.79      nan
z[1]             0.70  0.014   0.020   6.9e-01   0.72   0.72    2.0     0.79      nan

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

Current Version:

v2.17.1

bob-carpenter commented 6 years ago

That's puzzling. What platform are you using? That value is OK for I/O with my 64-bit Clang, and I can't imagine we're reading these in any other way.

int main() {
  std::stringstream ss("1.5316e-322");
  double x;
  ss >> x;
  std::cout << "x = " << x << std::endl;
}

Running it gives

$ clang++ foo.cpp
$ ./a.out
x = 1.5316e-322

P.S. This may wind up being a CmdStan issue if the problem's in the outside layer, or a Stan issue if it's in one of the underlying services.

michael-prange commented 6 years ago

Bob,

I am using an iMac running the latest OS. I will be away from my computer until Monday, but is there a software test you can suggest for me to run at that time? Perhaps I can change some of the stansummary code?

Michael

Sent from my phone

On Mar 2, 2018, at 15:36, Bob Carpenter notifications@github.com wrote:

That's puzzling. What platform are you using? That value is OK for I/O with my 64-bit Clang, and I can't imagine we're reading these in any other way.

int main() { std::stringstream ss("1.5316e-322"); double x; ss >> x; std::cout << "x = " << x << std::endl; } Running it gives

$ clang++ foo.cpp $ ./a.out x = 1.5316e-322 P.S. This may wind up being a CmdStan issue if the problem's in the outside layer, or a Stan issue if it's in one of the underlying services.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

bob-carpenter commented 6 years ago

The thing to do is always to take the case that is failing and create a small, reproducible example as a failing unit test.

Then fix the code so that it passes the test.

Stan's a big project with a lot of dev process at this point. If you want to dive in, here's the place to start:

https://github.com/stan-dev/stan/wiki/Developer-process-overview

michael-prange commented 6 years ago

I think I've localized the problem. In line 316 of cmdstan-2.17.1/stan/src/stan/io/stan_csv_reader.hpp is the line samples(row, col) = boost::lexical_cast(line);

When I made a test program using this line, I get the same error. Test program: // test.cpp

include <boost/lexical_cast.hpp>

include

int main() { double x = boost::lexical_cast("1.5316e-32"); std::cout << "x = " << x << std::endl; x = boost::lexical_cast("1.5316e-322"); std::cout << "x = " << x << std::endl; } Output: prange% ./test x = 1.5316e-32 libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >: bad lexical cast: source type value could not be interpreted as target Abort

I ran this on an iMac running OS 10.13.3 using the following g++ version: prange% g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.0.0 (clang-900.0.39.2) Target: x86_64-apple-darwin17.4.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

bob-carpenter commented 6 years ago

Thanks for creating that minimial example and tracking down the source.

The problem seems to be that the Boost lexical_cast function can only deal with normal double values; 1e-322 is too small to be represented as normal number---here are the double-precision numerical limits.

The Boost doc only suggests using std::stringstream for more control---as I showed above, that can handle the subnormal value (at least the clang++ on my Mac). I seem to remember using lexical_cast to get more control for some reason!

Here's a useful StackOverflow discussion---looks like a lot of this is going to be implementation dependent in the subnormal range, so I'm not sure there's going to be a general fix. But we might be able to go back to stringstream-based parsing.

michael-prange commented 6 years ago

Bob,

Since this issue is holding up a project I’m working on, I’ll switch my local installation to use stringstream instead of lexical_cast as a quick fix. I’ll report back if this produces any subsequent issues. Thanks.

Michael

On Mar 4, 2018, at 11:33 PM, Bob Carpenter notifications@github.com wrote:

The problem seems to be that the Boost lexical_cast function http://www.boost.org/doc/libs/1_66_0/doc/html/boost_lexical_cast/synopsis.html#boost_lexical_cast.synopsis.lexical_cast can only deal with normal double values https://en.wikipedia.org/wiki/Normal_number_(computing); 1e-322 is too small to be represented as normal number---here are the double-precision sizes https://en.wikipedia.org/wiki/Double-precision_floating-point_format#Double-precision_examples.

The Boost doc only suggests using std::stringstream for more control. I seem to remember using lexical_cast to get more control.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2480#issuecomment-370307233, or mute the thread https://github.com/notifications/unsubscribe-auth/AV4cYvi_yuIpOtBR9MMOlSF8ULcx4T3Nks5tbMAAgaJpZM4SZ3Df.

bob-carpenter commented 6 years ago

Yes, if you can, that should be fine. If it works OK, would you mind submitting a patch for Stan itself? Or sharing your mods so that I can submit the patch?

michael-prange commented 6 years ago

Bob,

I've run 1000's of cases today and the patch has worked flawlessly. I don't know how to submit a patch, so I appreciate you offer to submit it for me. Here it is:

On line 316 of cmdstan-2.17.1/stan/src/stan/io/stan_csv_reader.hpp, replace samples(row, col) = boost::lexical_cast(line); with std::stringstream ss(line); double x; ss >> x; samples(row, col) = x;

I'm sure there are lots of other places where the lexical_cast should be replace to gain more stability, but I've only replaced this one instance in order to fix my problem. I'll leave the larger design issue to the stan-dev team. Thanks for all your help. You really dug me out of a hole.

Michael