stan-dev / rstan

RStan, the R interface to Stan
https://mc-stan.org
1.04k stars 269 forks source link

vb() fails on 32bit sparc-solaris #252

Open bgoodri opened 8 years ago

bgoodri commented 8 years ago

https://www.r-project.org/nosvn/R.check/r-patched-solaris-sparc/rstanarm-00check.html

bgoodri commented 8 years ago

@bob-carpenter Do you have any idea about this. The link above has a "bus error" on 32bit SPARC which is described here http://stackoverflow.com/questions/1892566/c-bus-error-in-sparc-arcitecture It doesn't seem to happen with sampling, only with optimizing and ADVI. So, it is likely not anything with Eigen or Stan Math. I haven't been able to get any sort of test failure by adding -fno-strict-aliasing to the compiler flags.

bob-carpenter commented 8 years ago

I'm afraid not.

That stackoverflow comment points to alignment, and we're allocating memory and aligning it ourselves for all the gradient calcs, but that's going to be the same operations in all of those systems.

Is there any way to run tests on this without it being on the critical path for a CRAN submission? The alignment code is all in stan/math/memory

In particular, here:

      // FIXME: enforce alignment
      // big fun to inline, but only called twice
      inline char* eight_byte_aligned_malloc(size_t size) {
        char* ptr = static_cast<char*>(malloc(size));
        if (!ptr) return ptr;  // malloc failed to alloc
        if (!is_aligned(ptr, 8U)) {
          std::stringstream s;
          s << "invalid alignment to 8 bytes, ptr="
            << reinterpret_cast<uintptr_t>(ptr)
            << std::endl;
          throw std::runtime_error(s.str());
        }
        return ptr;
      }

And here's the memory alignment test:

   /**
     * Return <code>true</code> if the specified pointer is aligned
     * on the number of bytes.
     *
     * This doesn't really make sense other than for powers of 2.
     *
     * @param ptr Pointer to test.
     * @param bytes_aligned Number of bytes of alignment required.
     * @return <code>true</code> if pointer is aligned.
     * @tparam Type of object to which pointer points.
     */
    template <typename T>
    bool is_aligned(T* ptr, unsigned int bytes_aligned) {
      return (reinterpret_cast<uintptr_t>(ptr) % bytes_aligned) == 0U;
    }

And for that, see:

http://stackoverflow.com/questions/1898153/how-to-determine-if-memory-is-aligned-testing-for-alignment-not-aligning

So I'm not even sure I did that right, though, because I'm pushing in char* rather than void. So maybe removing teh template param and replacing with void. I also have no idea what "restrict" does in the upvoted answer to that stackoverflow.

On Feb 7, 2016, at 9:16 PM, bgoodri notifications@github.com wrote:

@bob-carpenter Do you have any idea about this. The link above has a "bus error" on 32bit SPARC which is described here http://stackoverflow.com/questions/1892566/c-bus-error-in-sparc-arcitecture It doesn't seem to happen with sampling, only with optimizing and ADVI. So, it is likely not anything with Eigen or Stan Math. I haven't been able to get any sort of test failure by adding -fno-strict-aliasing to the compiler flags.

— Reply to this email directly or view it on GitHub.

bgoodri commented 8 years ago

I think we would need to emulate a SPARC environment with QEMU or something. But is there any conceivable way that the memory alignment thing for autodiff could cause a bus error with LBFGS and ADVI but not MCMC? The fact that MCMC works on SPARC makes me think it is in the writer or something else that behaves differently depending on the algorithm.

On Sun, Feb 7, 2016 at 11:00 PM, Bob Carpenter notifications@github.com wrote:

I'm afraid not.

That stackoverflow comment points to alignment, and we're allocating memory and aligning it ourselves for all the gradient calcs, but that's going to be the same operations in all of those systems.

Is there any way to run tests on this without it being on the critical path for a CRAN submission? The alignment code is all in stan/math/memory

In particular, here:

// FIXME: enforce alignment
// big fun to inline, but only called twice
inline char* eight_byte_aligned_malloc(size_t size) {
char* ptr = static_cast<char*>(malloc(size));
if (!ptr) return ptr; // malloc failed to alloc
if (!is_aligned(ptr, 8U)) {
std::stringstream s;
s << "invalid alignment to 8 bytes, ptr="
<< reinterpret_cast<uintptr_t>(ptr)
<< std::endl;
throw std::runtime_error(s.str());
}
return ptr;
}

And here's the memory alignment test:

/**
* Return <code>true</code> if the specified pointer is aligned
* on the number of bytes.
*
* This doesn't really make sense other than for powers of 2.
*
* @param ptr Pointer to test.
* @param bytes_aligned Number of bytes of alignment required.
* @return <code>true</code> if pointer is aligned.
* @tparam Type of object to which pointer points.
*/
template <typename T>
bool is_aligned(T* ptr, unsigned int bytes_aligned) {
return (reinterpret_cast<uintptr_t>(ptr) % bytes_aligned) == 0U;
}

And for that, see:

http://stackoverflow.com/questions/1898153/how-to-determine-if-memory-is-aligned-testing-for-alignment-not-aligning

So I'm not even sure I did that right, though, because I'm pushing in char* rather than void. So maybe removing teh template param and replacing with void. I also have no idea what "restrict" does in the upvoted answer to that stackoverflow.

  • Bob

On Feb 7, 2016, at 9:16 PM, bgoodri notifications@github.com wrote:

@bob-carpenter Do you have any idea about this. The link above has a "bus error" on 32bit SPARC which is described here

http://stackoverflow.com/questions/1892566/c-bus-error-in-sparc-arcitecture It doesn't seem to happen with sampling, only with optimizing and ADVI. So, it is likely not anything with Eigen or Stan Math. I haven't been able to get any sort of test failure by adding -fno-strict-aliasing to the compiler flags.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181185239.

bob-carpenter commented 8 years ago

In which case it'd probably be a library issue. These pointers are the only place we directly mess with memory ourselves.

Of course, Eigen and the standard template library have to, as well.

On Feb 7, 2016, at 11:05 PM, bgoodri notifications@github.com wrote:

I think we would need to emulate a SPARC environment with QEMU or something. But is there any conceivable way that the memory alignment thing for autodiff could cause a bus error with LBFGS and ADVI but not MCMC? The fact that MCMC works on SPARC makes me think it is in the writer or something else that behaves differently depending on the algorithm.

On Sun, Feb 7, 2016 at 11:00 PM, Bob Carpenter notifications@github.com wrote:

I'm afraid not.

That stackoverflow comment points to alignment, and we're allocating memory and aligning it ourselves for all the gradient calcs, but that's going to be the same operations in all of those systems.

Is there any way to run tests on this without it being on the critical path for a CRAN submission? The alignment code is all in stan/math/memory

In particular, here:

// FIXME: enforce alignment
// big fun to inline, but only called twice
inline char* eight_byte_aligned_malloc(size_t size) {
char* ptr = static_cast<char*>(malloc(size));
if (!ptr) return ptr; // malloc failed to alloc
if (!is_aligned(ptr, 8U)) {
std::stringstream s;
s << "invalid alignment to 8 bytes, ptr="
<< reinterpret_cast<uintptr_t>(ptr)
<< std::endl;
throw std::runtime_error(s.str());
}
return ptr;
}

And here's the memory alignment test:

/**
* Return <code>true</code> if the specified pointer is aligned
* on the number of bytes.
*
* This doesn't really make sense other than for powers of 2.
*
* @param ptr Pointer to test.
* @param bytes_aligned Number of bytes of alignment required.
* @return <code>true</code> if pointer is aligned.
* @tparam Type of object to which pointer points.
*/
template <typename T>
bool is_aligned(T* ptr, unsigned int bytes_aligned) {
return (reinterpret_cast<uintptr_t>(ptr) % bytes_aligned) == 0U;
}

And for that, see:

http://stackoverflow.com/questions/1898153/how-to-determine-if-memory-is-aligned-testing-for-alignment-not-aligning

So I'm not even sure I did that right, though, because I'm pushing in char* rather than void. So maybe removing teh template param and replacing with void. I also have no idea what "restrict" does in the upvoted answer to that stackoverflow.

  • Bob

On Feb 7, 2016, at 9:16 PM, bgoodri notifications@github.com wrote:

@bob-carpenter Do you have any idea about this. The link above has a "bus error" on 32bit SPARC which is described here

http://stackoverflow.com/questions/1892566/c-bus-error-in-sparc-arcitecture It doesn't seem to happen with sampling, only with optimizing and ADVI. So, it is likely not anything with Eigen or Stan Math. I haven't been able to get any sort of test failure by adding -fno-strict-aliasing to the compiler flags.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181185239.

— Reply to this email directly or view it on GitHub.

syclik commented 8 years ago

could it have to do with an empty stream for printing? I know that's given us problems in the past.

On Sun, Feb 7, 2016 at 11:46 PM, Bob Carpenter notifications@github.com wrote:

In which case it'd probably be a library issue. These pointers are the only place we directly mess with memory ourselves.

Of course, Eigen and the standard template library have to, as well.

On Feb 7, 2016, at 11:05 PM, bgoodri notifications@github.com wrote:

I think we would need to emulate a SPARC environment with QEMU or something. But is there any conceivable way that the memory alignment thing for autodiff could cause a bus error with LBFGS and ADVI but not MCMC? The fact that MCMC works on SPARC makes me think it is in the writer or something else that behaves differently depending on the algorithm.

On Sun, Feb 7, 2016 at 11:00 PM, Bob Carpenter <notifications@github.com

wrote:

I'm afraid not.

That stackoverflow comment points to alignment, and we're allocating memory and aligning it ourselves for all the gradient calcs, but that's going to be the same operations in all of those systems.

Is there any way to run tests on this without it being on the critical path for a CRAN submission? The alignment code is all in stan/math/memory

In particular, here:

// FIXME: enforce alignment
// big fun to inline, but only called twice
inline char* eight_byte_aligned_malloc(size_t size) {
char* ptr = static_cast<char*>(malloc(size));
if (!ptr) return ptr; // malloc failed to alloc
if (!is_aligned(ptr, 8U)) {
std::stringstream s;
s << "invalid alignment to 8 bytes, ptr="
<< reinterpret_cast<uintptr_t>(ptr)
<< std::endl;
throw std::runtime_error(s.str());
}
return ptr;
}

And here's the memory alignment test:

/**
* Return <code>true</code> if the specified pointer is aligned
* on the number of bytes.
*
* This doesn't really make sense other than for powers of 2.
*
* @param ptr Pointer to test.
* @param bytes_aligned Number of bytes of alignment required.
* @return <code>true</code> if pointer is aligned.
* @tparam Type of object to which pointer points.
*/
template <typename T>
bool is_aligned(T* ptr, unsigned int bytes_aligned) {
return (reinterpret_cast<uintptr_t>(ptr) % bytes_aligned) == 0U;
}

And for that, see:

http://stackoverflow.com/questions/1898153/how-to-determine-if-memory-is-aligned-testing-for-alignment-not-aligning

So I'm not even sure I did that right, though, because I'm pushing in char* rather than void. So maybe removing teh template param and replacing with void. I also have no idea what "restrict" does in the upvoted answer to that stackoverflow.

  • Bob

On Feb 7, 2016, at 9:16 PM, bgoodri notifications@github.com wrote:

@bob-carpenter Do you have any idea about this. The link above has a "bus error" on 32bit SPARC which is described here

http://stackoverflow.com/questions/1892566/c-bus-error-in-sparc-arcitecture

It doesn't seem to happen with sampling, only with optimizing and ADVI. So, it is likely not anything with Eigen or Stan Math. I haven't been able to get any sort of test failure by adding -fno-strict-aliasing to the compiler flags.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181185239.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181194591.

bob-carpenter commented 8 years ago

I asked Daniel and he clarified he meant a null (0) pointer by "empty stream".

On Feb 8, 2016, at 12:43 AM, Daniel Lee notifications@github.com wrote:

could it have to do with an empty stream for printing? I know that's given us problems in the past.

On Sun, Feb 7, 2016 at 11:46 PM, Bob Carpenter notifications@github.com wrote:

In which case it'd probably be a library issue. These pointers are the only place we directly mess with memory ourselves.

Of course, Eigen and the standard template library have to, as well.

On Feb 7, 2016, at 11:05 PM, bgoodri notifications@github.com wrote:

I think we would need to emulate a SPARC environment with QEMU or something. But is there any conceivable way that the memory alignment thing for autodiff could cause a bus error with LBFGS and ADVI but not MCMC? The fact that MCMC works on SPARC makes me think it is in the writer or something else that behaves differently depending on the algorithm.

On Sun, Feb 7, 2016 at 11:00 PM, Bob Carpenter <notifications@github.com

wrote:

I'm afraid not.

That stackoverflow comment points to alignment, and we're allocating memory and aligning it ourselves for all the gradient calcs, but that's going to be the same operations in all of those systems.

Is there any way to run tests on this without it being on the critical path for a CRAN submission? The alignment code is all in stan/math/memory

In particular, here:

// FIXME: enforce alignment
// big fun to inline, but only called twice
inline char* eight_byte_aligned_malloc(size_t size) {
char* ptr = static_cast<char*>(malloc(size));
if (!ptr) return ptr; // malloc failed to alloc
if (!is_aligned(ptr, 8U)) {
std::stringstream s;
s << "invalid alignment to 8 bytes, ptr="
<< reinterpret_cast<uintptr_t>(ptr)
<< std::endl;
throw std::runtime_error(s.str());
}
return ptr;
}

And here's the memory alignment test:

/**
* Return <code>true</code> if the specified pointer is aligned
* on the number of bytes.
*
* This doesn't really make sense other than for powers of 2.
*
* @param ptr Pointer to test.
* @param bytes_aligned Number of bytes of alignment required.
* @return <code>true</code> if pointer is aligned.
* @tparam Type of object to which pointer points.
*/
template <typename T>
bool is_aligned(T* ptr, unsigned int bytes_aligned) {
return (reinterpret_cast<uintptr_t>(ptr) % bytes_aligned) == 0U;
}

And for that, see:

http://stackoverflow.com/questions/1898153/how-to-determine-if-memory-is-aligned-testing-for-alignment-not-aligning

So I'm not even sure I did that right, though, because I'm pushing in char* rather than void. So maybe removing teh template param and replacing with void. I also have no idea what "restrict" does in the upvoted answer to that stackoverflow.

  • Bob

On Feb 7, 2016, at 9:16 PM, bgoodri notifications@github.com wrote:

@bob-carpenter Do you have any idea about this. The link above has a "bus error" on 32bit SPARC which is described here

http://stackoverflow.com/questions/1892566/c-bus-error-in-sparc-arcitecture

It doesn't seem to happen with sampling, only with optimizing and ADVI. So, it is likely not anything with Eigen or Stan Math. I haven't been able to get any sort of test failure by adding -fno-strict-aliasing to the compiler flags.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181185239.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181194591.

— Reply to this email directly or view it on GitHub.

bgoodri commented 8 years ago

That is a possibility. Not that I can rule out anything as a possibility. Hopefully CRAN will allow us to link to the SegFault library on SPARC so that more information is provided in the backtrace.

On Mon, Feb 8, 2016 at 4:43 PM, Bob Carpenter notifications@github.com wrote:

I asked Daniel and he clarified he meant a null (0) pointer by "empty stream".

  • Bob

On Feb 8, 2016, at 12:43 AM, Daniel Lee notifications@github.com wrote:

could it have to do with an empty stream for printing? I know that's given us problems in the past.

On Sun, Feb 7, 2016 at 11:46 PM, Bob Carpenter <notifications@github.com

wrote:

In which case it'd probably be a library issue. These pointers are the only place we directly mess with memory ourselves.

Of course, Eigen and the standard template library have to, as well.

On Feb 7, 2016, at 11:05 PM, bgoodri notifications@github.com wrote:

I think we would need to emulate a SPARC environment with QEMU or something. But is there any conceivable way that the memory alignment thing for autodiff could cause a bus error with LBFGS and ADVI but not MCMC? The fact that MCMC works on SPARC makes me think it is in the writer or something else that behaves differently depending on the algorithm.

On Sun, Feb 7, 2016 at 11:00 PM, Bob Carpenter < notifications@github.com

wrote:

I'm afraid not.

That stackoverflow comment points to alignment, and we're allocating memory and aligning it ourselves for all the gradient calcs, but that's going to be the same operations in all of those systems.

Is there any way to run tests on this without it being on the critical path for a CRAN submission? The alignment code is all in stan/math/memory

In particular, here:

// FIXME: enforce alignment
// big fun to inline, but only called twice
inline char* eight_byte_aligned_malloc(size_t size) {
char* ptr = static_cast<char*>(malloc(size));
if (!ptr) return ptr; // malloc failed to alloc
if (!is_aligned(ptr, 8U)) {
std::stringstream s;
s << "invalid alignment to 8 bytes, ptr="
<< reinterpret_cast<uintptr_t>(ptr)
<< std::endl;
throw std::runtime_error(s.str());
}
return ptr;
}

And here's the memory alignment test:

/**
* Return <code>true</code> if the specified pointer is aligned
* on the number of bytes.
*
* This doesn't really make sense other than for powers of 2.
*
* @param ptr Pointer to test.
* @param bytes_aligned Number of bytes of alignment required.
* @return <code>true</code> if pointer is aligned.
* @tparam Type of object to which pointer points.
*/
template <typename T>
bool is_aligned(T* ptr, unsigned int bytes_aligned) {
return (reinterpret_cast<uintptr_t>(ptr) % bytes_aligned) == 0U;
}

And for that, see:

http://stackoverflow.com/questions/1898153/how-to-determine-if-memory-is-aligned-testing-for-alignment-not-aligning

So I'm not even sure I did that right, though, because I'm pushing in char* rather than void. So maybe removing teh template param and replacing with void. I also have no idea what "restrict" does in the upvoted answer to that stackoverflow.

  • Bob

On Feb 7, 2016, at 9:16 PM, bgoodri notifications@github.com wrote:

@bob-carpenter Do you have any idea about this. The link above has a "bus error" on 32bit SPARC which is described here

http://stackoverflow.com/questions/1892566/c-bus-error-in-sparc-arcitecture

It doesn't seem to happen with sampling, only with optimizing and ADVI. So, it is likely not anything with Eigen or Stan Math. I haven't been able to get any sort of test failure by adding -fno-strict-aliasing to the compiler flags.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstan/issues/252#issuecomment-181185239>.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181194591.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/252#issuecomment-181568436.