stan-dev / rstan

RStan, the R interface to Stan
https://mc-stan.org
1.04k stars 269 forks source link

Travis build are failing since months… #397

Closed mapio closed 7 years ago

mapio commented 7 years ago

I really appreciate that you have continuous integration in place in your project, as stated in the wiki. I've some issues with segfault depending on the architecture I run the code on (as reported by some issue already opened here months ago).

As a software engineer, can I modestly ask what is the point of using continuous integration with months of failing builds? This hardly will make your users confident of the stability of your software.

sakrejda commented 7 years ago

As an open-source project spread somewhat thin we would certainly appreciate your help in figuring out an approach to fixing the Travis builds.

jgabry commented 7 years ago

That's a good question. First one clarification: very few of the builds have actually failed. They error out unfortunately, but that's different from failing. Failing would indicate that our tests are failing, but error indicates that there's some problem getting travis, r cmd check/build, and rstan to behave nicely with each other. That is, that some problem occurred and it never even had a chance to pass or fail. And, as we've learned, there are tons of ways for errors to happen and many require quite a bit of effort to understand and fix. It started to become a real burden to keep fighting with travis (and it's supposed to reduce the burden!) and we didn't have the manpower or enough travis expertise to not need the manpower. We would very much like to get it working again! Of course we do run tons of tests locally, but getting travis to work consistently would be fantastic. Thankfully at least the Stan language and algorithms have their own continuous integration separate from the rstan interface (currently using both Jenkins and Travis) and we have more people dealing with that than we do for rstan.

On Fri, Feb 24, 2017 at 3:48 PM, Massimo Santini notifications@github.com wrote:

I really appreciate that you have continuous integration in place in your project, as stated in the wiki. I've some issues with segfault depending on the architecture I run the code on (as reported by some issue already opened here months ago).

As a software engineer, can I modestly ask what is the point of using continuous integration with months of failing builds https://travis-ci.org/stan-dev/rstan/builds? This hardly will make your users confident of the stability of your software.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/397, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q-HmWzSERFHct2Mh7C7qwTakPSeeks5rf0IYgaJpZM4MLqN6 .

mapio commented 7 years ago

Unfortunately I'm not an R (nor stan) expert. I stumbled across it helping a colleague that was having very inconsistent results according the OS and versions of the various libraries he uses. I hoped to be able to help him, but unfortunately it seems to be quite complex to have this tools run together.

You blame Travis, his complexity or the burden it imposes on the project for the failing tests. Having used Travis (as thousands of other developers) for many other projects, I doubt that the problem is where you think it is. If you have so many tests running locally without failures, it should not be that hard to make them work on Travis (or any other CI tool).

I really hope you'll be able to get this in shape, cause it is very sad for a widespread software like this to have such a track on a CI server.

At least, explain in the wiki why you think Travis is not adequate for your project, eliminate it and give clear instructions to the developers on how to run the tests locally.

jgabry commented 7 years ago

Of course you are 100% correct that rstan should do a better job at this. I have no doubt that you are right that it is possible, but we haven't been able to consistently solve that ourselves yet. There are various open issues (and a few closed) at the travis repo related to consistently getting successful builds with complicated R packages that that interface with C++ and take a long time to compile like this one, so we're not the only ones. For example, lme4, one of the more popular R packages also has a problem with travis. And worse than ours because their builds are actually incorrectly labeled as passing! That is, their builds look like they are passing but if you look at the actual logs the builds don't work at all and should have errored (https://travis-ci.org/lme4/lme4/builds). (To be clear I'm not criticizing lme4, just trying to provide some context.)

Also, take a look at some of our other R packages like shinystan, bayesplot, and loo. Those packages all regularly pass on travis and all have nearly 100% line coverage (not that line coverage is a measure of test quality). So we're totally on board with that. And that's what we would love to have for rstan too but we're not there yet. There are also some design flaws in rstan that may make this more complicated than it needs to be. We're planning on initiating an overhaul of the rstan package ( https://github.com/stan-dev/stan/wiki/User-Interface-Guidelines-for-Developers) this spring and the plan is to try to confront the CI issue again and get it right.

On Fri, Feb 24, 2017 at 4:52 PM, Massimo Santini notifications@github.com wrote:

Unfortunately I'm not an R (nor stan) expert. I stumbled across it helping a colleague that was having very inconsistent results according the OS and versions of the various libraries he uses. I hoped to be able to help him, but unfortunately it seems to be quite complex to have this tools run together.

You blame Travis, his complexity or the burden it imposes on the project for the failing tests. Having used Travis (as thousands of other developers) for many other projects, I doubt that the problem is where you think it is. If you have so many tests running locally without failures, it should not be that hard to make them work on Travis (or any other CI tool).

I really hope you'll be able to get this in shape, cause it is very sad to such a widespread software to have such a track on a CI server.

At least, explain in the wiki why you think Travis is not adequate for your project, eliminate it and give clear instructions to the developers on how to run the tests locally.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/397#issuecomment-282415597, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q8bJmDymn3IZp2et9yNI4pVGJmnTks5rf1E2gaJpZM4MLqN6 .

mapio commented 7 years ago

I'm happy to read that you will work on it! Thank you for your efforts and your time.

jgabry commented 7 years ago

Thanks for pushing us on this topic! (I sincerely mean that.)

On Fri, Feb 24, 2017 at 5:31 PM, Massimo Santini notifications@github.com wrote:

I'm happy to read that you will work on it! Thank you for your efforts and your time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/397#issuecomment-282423582, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Qw7Ob_QQKjpzbZVRyjeuKTcaQXi3ks5rf1pAgaJpZM4MLqN6 .

bob-carpenter commented 7 years ago

We just hired someone to deal with dev ops issues like these. We're looking into what hardware/software solutions we need. The problem we've been having with Travis is that it times out and just returns failures. We'd appreciate any help from people who know what they're doing with Travis.

Part of the problem is C++, each compiler instance of which seems to implement a different language (part of the problem is the undefined bits of the spec). We've had huge problems with Windows C++ toolchains failing in places none of the other platforms failed on (and not just from varying sizes of built-in types---it's the tools themselves segfaulting or blowing out memroy). Not an excuse, but not something we know how to deal with on the budget we have.

mapio commented 7 years ago

I really appreciate your efforts. I've not run such long builds on Travis, so I've never faced timeouts. But I know for sure you can cache intermediate results and, in case the compilation is not part of the test, you could also run the tests from binary release of the software they depend on — for example building and storing the binary artefacts outside Travis.

I'm also sure that, given the importance and popularity of stand and rstan, Travis people will be supporting your project allowing for more resource usage (such as, but not limited to, longer timeouts).

maverickg commented 7 years ago

The issue is not the time-out as at the very beginning I had a script to work around the issue that travis would fail us if nothing is printed to the output in 10 minutes. There are several reasons making it difficult to have a simple script for Travis.

mapio commented 7 years ago

Why don't you focus on a single OS? Today using Docker is so easy on OS X and Windows that if you completely and correctly support any GNU/Linux distribution, users of other OSs cha be at least confident that installing R in a Docker container will do the trick for them. Beside the fact that if you want high performance computation in the cloud, you'll need for sure some form of containerization.

Having said this, imagine how scared feels a user that reads here that is so difficult to get the dependencies right and to install the library. I've attempted to to that for a colleague of mine (I'm using scientific software since 25yrs) and I had so many trouble in so many different circumstance.

I really like your work and I think it's a big improvement wrt previous Bayesian estimators running in R (not to call names); I also realise that supporting many OSs is a good thing to do. But not if your support is so fragile in all of them.

Why not focusing just on GNU/Linux, providing a provably reproducible (and automatized) installation procedure, that will make a reasonable set of test pass for every user (and thus, for Travis, or other CI you choose)?

Once you have a stable build, you can try to add experimental support for other OSs, or add more esoteric tests that rely on external less stable components.

But at least, you will be able to show to the user base that you have a stabile solid build and to your developers that when they commit new code they must respect the test and not introduce regression errors. If all of your build are broken, the developers will have no clue and probably add errors that will pile up and make the library a mess.

This is just my two cents… don't take me wrong. I really appreciate your efforts and, as an open source developer, I know how hard it is to provide good software for others!

bob-carpenter commented 7 years ago

We appreciate feedback, so don't worry about offending us.

Stan is stable on all the platforms we support. RStan is managed independently and we can probably use a lot more tests there. We just landed a major refactor of the command/services infrastructure to Stan's algorithm, so all the code's about to change. Hopefully we can get more tests into place for RStan. Many of the low-level tests will go away because all the tricky algorithm code in RStan got refactored into the core stan-dev/stan repo.

Our tests through Travis are not stable due to issues with timeouts, etc. We just hired Sean Talts to focus on dev ops and hopefully he'll be able to fix a bunch of these issues. At least with stan-dev/stan and stan-dev/math (our core math library and Stan language and algorithms library), we always require the unit tests and integration tests to pass before merging. Maybe they can kick us some more cycles---we can always ask. Their paid plans don't look much better, so we may just bail on Travis altogether. Not really my area of expertise.

Our installations are challenging for many users because of the need to install a C++ toolchain. This is especially challenging for Windows users who aren't used to installing development tools. I wish R had an easier way to do this. And I wish packages were more encpasulated in R---part of our problem is the global nature of R packages, where they all want to get in and change the C++ build tools and all dump stuff into the global environment (not required, but nobody in R seems to use namespace quals in client code).

Docker's been evolving. First time we checked, everyone said it was unstable on Windows. Now, we have many flavors of Docker containers ranging from multi-gigabyte full encapsulations of everything from C++ to RStudio to LaTeX for reproducible remote research to simple wrappers to let us deploy Jupyter on the web. If you're volunteering to build generic Docker containers and provide instructions for our users, we'd welcome the contribution.

I don't think our mainstream users want to run RStan in a standalone Docker container---they want it integrated into their standard R workflows in their single install. Is there an easy way to do that? We've been talking about having Stan run in Docker and then communicate with clients via protocol buffers. We have a GSoC project oriented toward that.

On Feb 27, 2017, at 4:06 AM, Massimo Santini notifications@github.com wrote:

Why don't you focus on a single OS? Today using Docker is so easy on OS X and Windows that if you completely and correctly support any GNU/Linux distribution, users of other OSs cha be at least confident that installing R in a Docker container will do the trick for them. Beside the fact that if you want high performance computation in the cloud, you'll need for sure some form of containerization.

Having said this, imagine how scared feels a user that reads here that is so difficult to get the dependencies right and to install the library. I've attempted to to that for a colleague of mine (I'm using scientific software since 25yrs) and I had so many trouble in so many different circumstance.

I really like your work and I think it's a big improvement wrt previous Bayesian estimators running in R (not to call names); I also realise that supporting many OSs is a good thing to do. But not if your support is so fragile in all of them.

Why not focusing just on GNU/Linux, providing a provably reproducible (and automatized) installation procedure, that will make a reasonable set of test pass for every user (and thus, for Travis, or other CI you choose)?

Once you have a stable build, you can try to add experimental support for other OSs, or add more esoteric tests that rely on external less stable components.

But at least, you will be able to show to the user base that you have a stabile solid build and to your developers that when they commit new code they must respect the test and not introduce regression errors. If all of your build are broken, the developers will have no clue and probably add errors that will pile up and make the library a mess.

This is just my two cents… don't take me wrong. I really appreciate your efforts and, as an open source developer, I know how hard it is to provide good software for others!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

mapio commented 7 years ago

I wish I had time and know stan/RStan well enough to help you (and my colleague) to sort this out.

Of course I understand that full containerisation is not a viable solution for every user, but as fast as Docker is today an how easy is to mount a local piece of the filesystem in a container, one could think of running just the compilation and execution of the model under Docker and leave all the rest in the host environment (R, o RStudio).

Something in the spirit of https://github.com/bfirsh/whalebrew

It would be nice to have a chat with Sean Talts if he has time, or interest, in discussing this.

bob-carpenter commented 7 years ago

On Feb 28, 2017, at 3:53 PM, Massimo Santini notifications@github.com wrote:

I wish I had time and know stan/RStan well enough to help you (and my colleague) to sort this out.

Of course I understand that full containerisation is not a viable solution for every user, but as fast as Docker is today

The speed's not the issues. Once we got some build issues sorted out, there's no slowdown from Docker.

an how easy is to mount a local piece of the filesystem in a container, one could think of running just the compilation and execution of the model under Docker and leave all the rest in the host environment (R, o RStudio).

Yes, that's exactly what Allen (PyStan dev) has been urging us to do to avoid all this in-the-skins compiler dependency.