metrumresearchgroup / bbi

Next generation modeling platform
11 stars 2 forks source link

automatically remove parallelization files #285

Closed callistosp closed 9 months ago

callistosp commented 1 year ago

When running a NONMEM model in parallel, many folders with the prefix WK_ are produced. It would be nice if bbi automatically removed these files. There are also files produced after every run with the extension .o# and .po# where the # is the number assigned by the grid. I think these could also be automatically handled by bbi to prevent buildup of junk files in version control when saving a whole model directory.

shairozan commented 1 year ago

Would also probably make sense to update the .gitignore file so that even if the files get left around they wouldn't be committed in a git environment

jacobleander commented 9 months ago

Picking up this thread again. I would also prefer having these files deleted as part of the clean up of the model directory. IF there are users who would really like these files to not be deleted, could there be a global option in bbi where this can be turned on or off (default setting should then be that the files are removed). Thoughts @seth127 ?

seth127 commented 9 months ago

Thanks for getting this going again @jacobleander. I think this is the right idea:

could there be a global option in bbi where this can be turned on or off (default setting should then be that the files are removed)

We currently have a --clean_lvl flag that controls our deletion of other NONMEM temp files. I would think we could use that to control this behavior as well. Any thoughts on that @kyleam ?

seth127 commented 9 months ago

Also, @kyleam if we decide to do this through --clean_lvl, we should consider revisiting this old issue while we're in there: https://github.com/metrumresearchgroup/bbi/issues/194

@jacobleander if you or anyone else have any thoughts on that ^, feel free to comment on that issue as well. The original intention of the --clean_lvl and --copy_lvl flags was related to legacy PsN behavior that we were trying to either mimic or intentionally change. However, we've gotten a bit away from that in past few years (primarily because most bbi users are actually using it via bbr, instead of using the CLI directly). If we do revisit this, it would be useful to have any user commentary on when these flags might be useful and what the most useful behavior would be.

kyleam commented 9 months ago

I've put up a draft at gh-306. There's an issue with the CI that needs to be sorted out before considering merging that in.

@seth127:

We currently have a --clean_lvl flag that controls our deletion of other NONMEM temp files. I would think we could use that to control this behavior as well. Any thoughts on that @kyleam ?

Conceptually I think that makes sense, but there's already a spot in filesToCleanup that takes care of cleaning up some parallel file related things. It doesn't consider the clean level, so in gh-306 I decided to just follow that. I suppose we could update that whole block to be skipped when clean_level=0, but I think in practice unconditionally cleaning empty WK_ files is probably fine, at least until we take on something like the more comprehensive rework proposed in gh-194. Let me know if you think the clean_level guard is worth doing at this point.


@callistosp:

There are also files produced after every run with the extension .o# and .po# where the # is the number assigned by the grid

With gh-306, I focused just on the WK_* files. What to do in terms of cleaning the .o# (should always have output, I think) and .po# (in bbi context, perhaps just has output if parallel environment setup fails, not sure) files seems less clear-cut to me. Either way, I think it makes sense to do separately from gh-306.

kyleam commented 9 months ago

With the merge of gh-306, WK_* files are automatically cleaned up. I've left this issue open due to this part, which has not been addressed:

There are also files produced after every run with the extension .o# and .po# where the # is the number assigned by the grid

As I said above, I'm not sure about what, if anything, to do with those.

I'll wait a bit for further feedback and thoughts on that, but my preference would be to close this issue and either 1) leave the .o# and .po# files be or 2) open a dedicated issue for how to handle them.

seth127 commented 9 months ago

Thanks @kyleam

my preference would be to close this issue and either 1) leave the .o# and .po# files be or 2) open a dedicated issue for how to handle them.

I agree with closing this issue now, and I would lean towards just opening a fresh one to consider what to do with the .o# and .po# files. Those contain output from the process running on the grid, and deserve some more investigation and probably discussion before making any moves on cleaning them up.

kyleam commented 9 months ago

Thanks @seth127. Follow-up issue posted at gh-312.