stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.59k stars 370 forks source link

vb crash on Windows #1758

Open bgoodri opened 8 years ago

bgoodri commented 8 years ago

I can get a segfault on Windows from cmdstan (or rstan) with the following model / data / syntax

 ./weibull.exe variational data file=weibull.data.R random seed=1913258051

weibull.txt weibull.data.txt

This seems to not be reproducible on Linux / Mac, although it is difficult to get meanfield to converge.

bgoodri commented 8 years ago

@akucukelbir or @dustinvtran Do you have access to a Windows machine to reproduce this? Change weibull.txt to weibull.stan and weibull.data.txt to weibull.data.R .

dustinvtran commented 8 years ago

Unfortunately no. :( I can see if it's possible to get it to converge, and maybe that would indirectly point us to where the problem is.

akucukelbir commented 8 years ago

negative on my end as well. no more windows in my life.

On Fri, Jan 22, 2016 at 2:57 PM, Dustin Tran notifications@github.com wrote:

Unfortunately no. :( I can see if it's possible to get it to converge, and maybe that would indirectly point us to where the problem is.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174028066.

bgoodri commented 8 years ago

The backtrace is related to the streaming

#2  0x00514a70 in std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, char, double) const ()
akucukelbir commented 8 years ago

that, sadly, says nothing to me. :(

On Fri, Jan 22, 2016 at 4:28 PM, bgoodri notifications@github.com wrote:

The backtrace is related to the streaming

2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174054154.

bgoodri commented 8 years ago

Sadly, that's all it says. But I think it means that it tried to print some segfault inducing output to the screen rather than the variational approximation process terminated unexpectedly. Possibly a string was too long or had an illegal character in it.

On Fri, Jan 22, 2016 at 5:01 PM, Alp Kucukelbir notifications@github.com wrote:

that, sadly, says nothing to me. :(

On Fri, Jan 22, 2016 at 4:28 PM, bgoodri notifications@github.com wrote:

The backtrace is related to the streaming

2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits >

std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > ::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174054154.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174066770.

bob-carpenter commented 8 years ago

Not a C++ whisperer yet? You just need to be motivated to parse everything out and type what you find into Google.

This:

2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

is a function signature, which is revealed by reorganizing it a bit (it helps to drop all the "std::", while you're at it):

RETURN TYPE:

ostreambuf_iterator<char, char_traits >

FUNCTION:

num_put<char, ostreambuf_iterator<char, char_traits > >::_M_insert_float

ARGUMENT TYPES:

(ostreambuf_iterator<char, char_traits >, ios_base&, char, char, double)

CONST DECLARATION

const ()

So we know to look up "std::num_put" (I couldn't live without cplusplus.com):

http://www.cplusplus.com/reference/locale/num_put/

So it's crashing at some point where it's trying to insert a double into an output stream. And it looks like its inheriting from a locale somewhere, which is something that could easily vary across platforms.

The memory location's not too helpful (to me, at least). I usually just try to bisect the code manually using print statements at this point, but you need to be able to recreate the error for that. You could also review all the I/O if you have any other hint as to when the error occurs.

We have a Windows box in the office for just this kind of spelunking.

On Jan 22, 2016, at 5:01 PM, Alp Kucukelbir notifications@github.com wrote:

that, sadly, says nothing to me. :(

On Fri, Jan 22, 2016 at 4:28 PM, bgoodri notifications@github.com wrote:

The backtrace is related to the streaming

2 0x00514a70 in std::ostreambuf_iterator<char, std::char_traits > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits > >::_M_insert_float(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, char, double) const ()

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-174054154.

— Reply to this email directly or view it on GitHub.

bgoodri commented 8 years ago

What Bob said. I'm continuing to guess that it tried to stream a double that was too big. If that is correct, could we use scientific notation or something for the delta_ELBO_mean? It already looks weird when it uses different numbers of digits to the left of the decimal point.

akucukelbir commented 8 years ago

i'm happy to look into using scientific notation for delta_ELBO_mean. i don't know how to test for this though... do either of you know how to test for these sorts of windows-only bugs?

bgoodri commented 8 years ago

We do have a Windows machine. You might be able to write a unit test that just streams a million random characters in the same way that ADVI does.

On Tue, Feb 2, 2016 at 7:47 AM, Alp Kucukelbir notifications@github.com wrote:

i'm happy to look into using scientific notation for delta_ELBO_mean. i don't know how to test for this though... do either of you know how to test for these sorts of windows-only bugs?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1758#issuecomment-178556064.

syclik commented 8 years ago

@dustinvtran, @akucukelbir. Bump.