metrumresearchgroup / bbr

R interface for model and project management
https://metrumresearchgroup.github.io/bbr/
Other
23 stars 2 forks source link

Can't execute bbr commands via the clusters #716

Open LeoLuongVuong opened 2 months ago

LeoLuongVuong commented 2 months ago

Hi I'm trying to execute submit my NONMEM models with bbr in my clusters but I couldn't. I constantly got the following error. Can someone help me out here please? Many thanks.

submit_model( mod1,
.mode = "local",
.bbi_args = list(parallel = TRUE, threads = 4, overwrite = TRUE) # not needed if set in bbi.yaml ) Error in check_status_code(p$get_exit_status(), output, .cmd_args) :

bbi nonmem run local /vsc-hard-mounts/leuven-data/357/vsc35700/Posa_Ped_IPDMA_MLV/nonmem_modelling/1.mod --parallel --threads=4 --overwrite returned status code 1 -- STDOUT and STDERR: time="2024-08-30T16:03:03+02:00" level=info msg="Successfully loaded default configuration from /vsc-hard-mounts/leuven-data/357/vsc35700/Posa_Ped_IPDMA_MLV/nonmem_modelling/bbi.yaml" time="2024-08-30T16:03:03+02:00" level=info msg="Beginning Local Path" time="2024-08-30T16:03:03+02:00" level=info msg="A total of 1 models have completed the initial preparation phase" time="2024-08-30T16:03:03+02:00" level=info msg="[1] Beginning local work phase" time="2024-08-30T16:03:13+02:00" level=error msg="[1] Exit code was 115, details were exit status 115" time="2024-08-30T16:03:13+02:00" level=error msg="[1] output details were: Starting NMTRAN\n \n WARNINGS AND ERRORS (IF ANY) FOR PROBLEM 1\n \n (WARNING 2) NM-TRAN INFERS THAT THE DATA ARE POPULATION.\n \n (WARNING 3)

kylebaron commented 2 months ago

Hi @LeoLuongVuong -

Is there any additional information in the .lst file? You could cut out the model code and let us see if NONMEM is sending anything else back.

Also - can you tell us if / how the model runs with local execution?

Kyle

LeoLuongVuong commented 2 months ago

Hi Kyle Thanks for your speedy response! Below is the .lst output NM-TRAN MESSAGES

WARNINGS AND ERRORS (IF ANY) FOR PROBLEM 1

(WARNING 2) NM-TRAN INFERS THAT THE DATA ARE POPULATION.

(WARNING 3) THERE MAY BE AN ERROR IN THE ABBREVIATED CODE. THE FOLLOWING ONE OR MORE RANDOM VARIABLES ARE DEFINED WITH "IF" STATEMENTS THAT DO NOT PROVIDE DEFINITIONS FOR BOTH THE "THEN" AND "ELSE" CASES. IF ALL CONDITIONS FAIL, THE VALUES OF THESE VARIABLES WILL BE ZERO.

W Y

(WARNING 79) SIGMA IS USED ON THE RIGHT. WITH A SUBSEQUENT RUN, IF AN INITIAL ESTIMATE OF A DIAGONAL BLOCK OF SIGMA IS TO BE COMPUTED BY NONMEM, THAT BLOCK WILL BE SET TO AN IDENTITY MATRIX DURING THAT COMPUTATION. THIS COULD LEAD TO AN ARITHMETIC EXCEPTION.*

Stop Time: Fri Aug 30 16:03:13 CEST 2024

The model runs perfectly when I executed it with Pirana.

Thanks a lot.

kylebaron commented 2 months ago

Thanks; can you confirm

This looks like a $SIZES issue, but that should be largely taken care of with recent bbi versions which set maxlim when running NONMEM. I just want to confirm that first: it should be ruled out if you're using recent bbi which does this.

LeoLuongVuong commented 2 months ago
  1. As I said, I did not run it locally but via my cluster. It did not support parallelization I think since the run time was the same when I ran it singularly or on multiple cores/threads
  2. I'm using bbr 1.11.0
  3. the dataset has 322 subjects with 14,321 observations. the model has 9 THETAs.

I hope this helps!

kylebaron commented 2 months ago

As I said, I did not run it locally

I'm asking you to run it locally.

LeoLuongVuong commented 2 months ago

I don't have NONMEM on my PC. That would take some time till I can install it.

seth127 commented 2 months ago

Hello @LeoLuongVuong . I'm glad to see this discussion ongoing here. I'm jumping in to hopefully provide some clarification points:

  1. Thank for giving us the bbr version, but @kylebaron was also asking for the bbi version. You can get this from the R console with bbr::bbi_version().
  2. I'm a little confused by the discussion of "cluster" vs. "local" execution. I see .mode = "local" in the original call. My guess is that you're running this on a remote server of some kind (i.e. not your laptop) but that is still "local" execution mode, in the sense that it is executing directly on that server. By contrast, .mode = "sge" (the default) would submit this to an SGE queue, typically to run on remote servers in a cluster/grid. All this to say: it looks like you currently are running "locally".
  3. Kyle asks you to try the same model not parallelizing. That would be passing .bbi_args = list(parallel = FALSE, overwrite = TRUE) (which will only use a single thread/CPU). Let us know if that runs successfully.

As Kyle noted, it seems like you may have a NONMEM issue (potentially related to SIZES or maxlim). That said, if you get that sorted and you're interested in some more background reading on parallelizing in bbr, these two articles might be useful:

Best of luck, and thanks for jumping in so quickly Kyle!

LeoLuongVuong commented 2 months ago

Thanks a lot, @seth127 for your detailed explanation! It appears much clearer to me now. Also, sorry for overlooking your questions @kylebaron. So, to provide more info:

  1. my bbi version is 3.3.0
  2. you are absolutely right! I am indeed running "locally"
  3. It seems like it does run successfully since I see there are way more outputs being generated, so that's great. Do you know why it didn't work with more threads?

Regarding the NONMEM issue, do you have any advise on how to solve that?

Many thanks again for your timely support!

LeoLuongVuong commented 1 week ago

Hi Kyle and Seth

I would like to open this issue again since I ran into the same problem, but this time when using NONMEM locally on my PC.

Specifically, I seem to have size issue "LIM VALUES MAXLIM ASSESSED BY NMTRAN: 1,2,3,4,5,6,7,8,10,11,13,15,16 " although I used the most recent bbi version: 3.3.0.

My synxtax is this: submit_model( mod1, .mode = "local", .bbi_args = list(parallel = FALSE, overwrite = TRUE), .wait = FALSE # not needed if set in bbi.yaml )

Can you give me some suggestion on how to solve this?

Thanks!