ryantibs / quantgen

Tools for generalized quantile modeling
https://ryantibs.github.io/quantgen
14 stars 9 forks source link

Investigate intermittent hanging of `quantile_lasso` on certain systems for both gurobi and glpk #13

Closed brookslogan closed 2 years ago

brookslogan commented 2 years ago

While exploring a quantgen-based forecaster solving many problems like the one in #12, I have found that a couple of desktops/servers running Ubuntu will sometimes stop making progress on solving quantile_lasso problems and just be stuck forever in between different subproblems (verbose=TRUE ends with "Problems solved (of 23): 5 ..." or "Problems solved (of 23): 5 ... 10 ..."). My laptop running Fedora has not stalled in the same fashion, that I can tell. I don't see excessive memory usage on the desktops when they hang this way, so I'm suspecting it's something about the R version or R package versions or OS libraries or other OS characteristics causing the issue.

brookslogan commented 2 years ago

I encountered a similar type of issue with quantreg on the Fedora system, but not on the Ubuntu systems. It might only be similar rather than the same, though, as I had more luck re-running on the Fedora system than re-running with quantgen on the Ubuntu systems. Still, I'm guessing that the root of the issue is in the pipeline rather than the quantgen package.

Debugging the pipeline mentioned above may be useful for improving documentation or error detection to quantgen. However, I have not quite nailed down the cause. At least in the quantreg case (fresher in memory), failures would pop up some sessions but not others. Once a fit stalled in a session, it would repeatedly stall in that session, but again, might just go through in a fresh session.

For the quantreg case, I think all the stalling cropped up while running in parallel, but sequential runs in the same sessions still stalled. For the quantgen case, I thought that I had sequential runs in a fresh sessions that also stalled. I need to double-check the latter.

The quantreg stalls may have been resolved by removing assigning function environments from parallel fitting procedures to variables in the global environment; more testing is still required.

brookslogan commented 2 years ago

Since this seems to deal with the particular pipeline and maybe dependency package versions, etc., I'm going to close this issue until I have a good reproducible example of something that can actually be improved in quantgen like documentation, dependency version-tagging, etc.