Allow overwriting of MCMC chains

atredennick commented 3 days ago

When running Stan models using bbr.bayes the chains from a model run are saved with filenames that include a timestamp. If the user then re-runs the model with updated data (for example) then new chain files are created and then old ones get removed. Fine, except for SVN. It just creates a bottleneck where all missing files now have to be deleted and the new one's added. It actually makes version control more difficult, too. Instead of just going back in time on a single file, you'd have to retrieve a deleted file (very tricky).

So, my proposal it to use simple naming for the chain .csv files that does not include a timestamp. This will make life easier and flows better within our typical workflow at Metrum. It also mimics what bbr does for NONMEM models (at least the non-Bayes ones). If the user specifies .overwrite = TRUE, then truly overwrite it.

kyleam commented 3 days ago

@atredennick Thanks for the suggestion.

My initial thought is that I don't see any issues with that change (essentially moving away from cmdstanr defaults).

Some questions, likely rooted in my lack of knowledge of SVN (or perhaps SVN + RStudio):

Fine, except for SVN. It just creates a bottleneck where all missing files now have to be deleted and the new one's added.

Is the bottleneck here an interface one? Mostly that you need select twice as many files (deletions + new files) to commit with SVN? Or is there some performance aspect at either the SVN or RStudio level?

It actually makes version control more difficult, too. Instead of just going back in time on a single file, you'd have to retrieve a deleted file (very tricky).

I'm surprised to hear that's very tricky in SVN. Is it because SVN requires you to know the full names?

(In the Git world, you could just specify the parent directory to, say, restore all files from under there. Or, if you really cared about just the csvs, you could specify a glob. I can't think of a case where you'd need to know the timestamps on the files.)

atredennick commented 1 day ago

Not a performance issue. Just a hassle when re-running a ton of models.
I've only heard it's tricky with SVN to retrieve a deleted file. No matter what, easier to revert to specific revision number and just have filenames the same.

kyleam commented 1 day ago

Not a performance issue. Just a hassle when re-running a ton of models.

Okay, well it's good it's not performance. I'm still unclear on how it's a hassle. Is it connected to manually adding things in the RStudio interface? From the command line, it's hard to imagine there's not an easy way to add/commit everything without specifying each file (but again, I don't know SVN).

None of that is to say "just use the command line". As I said, I don't have any initial objections to this change. But if the motivation from moving away from the current/default naming is that it makes things a hassle with SVN, then I'd like to have some understanding of the details.

kylebaron commented 1 day ago

It just creates a bottleneck where all missing files now have to be deleted and the new one's added.

Hopefully I'm understanding @atredennick report here:

I think this would be a hassle too. Filenames with timestamps guarantee you are dealing with a new set of files whenever you re-run (and check in). When the files get added to the repository, they don't just go away when they get deleted. They will come back (and keep coming back) whenever you run svn up. So you'd have to run svn rm <chain file name> (I'm sure you could glob that) and then svn add <new chain file> ... every time you re-run (and check in).

The pattern is different in NONMEM: files don't have timestamps and get overwritten when you re-run a model. If you want to go to an earlier re-run, then you can get it from a previous revision. I'd also note that psn does create a new set of files every time you re-run the (NONMEM) model and bbr was designed to avoid this pattern b/c it resulted in a mess of files and bloated repositories.

kyleam commented 1 day ago

When the files get added to the repository, they don't just go away when they get deleted. They will come back (and keep coming back) whenever you run svn up. So you'd have to run svn rm <chain file name>

Huh, wow. (I understand even less about SVN than I thought I did :/). Thanks, those details are very helpful.

atredennick commented 1 day ago

@kylebaron described the issue well. Currently, I have run a little piping command in terminal where I find all missing files and remove them from svn. Not a huge pain, but easy to mess up, too. Then I have to do the same thing to --force add all new files. Again, doable. But it would just be way easier and streamlined to not have to (like the bbr NONMEM workflow).

Basically, I want to not have this be my most-visited webpage: https://stackoverflow.com/questions/9600382/svn-command-to-delete-all-locally-missing-files :)

kyleam commented 1 day ago

I find all missing files and remove them from svn. Not a huge pain, but easy to mess up, too. Then I have to do the same thing to --force add all new files. Again, doable.

Oy, thanks for expanding on what the hassle is.

metrumresearchgroup / bbr.bayes

Allow overwriting of MCMC chains #154