r-lib / devtools

Tools to make an R developer's life easier
https://devtools.r-lib.org
Other
2.37k stars 753 forks source link

devtools::check() in RStudio uses a bunch of swap #2433

Closed rmflight closed 2 years ago

rmflight commented 2 years ago

I have a package I've been running build and check on a bunch, and over time I noticed that I was getting a lot of SWAP memory used, but my main RAM was still well under what is available in my machine.

So I did a simple test, running devtools::check() in a plain Linux terminal, and then in RStudio.

The package is available here: https://github.com/MoseleyBioinformaticsLab/ScanCentricPeakCharacterization

Here is the video where I run check in a terminal: https://youtu.be/5GvG_FDnrXs

You can see that I start at 0 swap, and then when the package gets re-installed or something I have 1.5 MB used.

And then here is the video running under RStudio: https://youtu.be/lM9GoQMYoaE

I start with the 1.5 used from previous, and then quickly climb through 300 MB of swap used. Just the one time wouldn't be an issue, but if you are modifying things, and then re-running the build / check whether in the RStudio terminal or build-pane, it keeps climbing over time. This behavior seems to happen regardless of RStudio terminal directly or build pane is used.

I've also checked that this happens from a completely fresh reboot of machine, load up RStudio, and run devtools::check() with nothing else running or not having done a check() yet at all on the package.

I'm not 100% sure if this is an RStudio issue or a devtools issue, but it seems to be the behavior of devtools::check within RStudio.

Session information:

rmflight commented 2 years ago

OK, I just struck the bit about fresh reboot and running in RStudio. That is NOT true. Fresh reboot and running RStudio actually had the same 1 MB of SWAP used.

I'm going to run another test where I run multiple check() right after each other, because I suspect that is the actual problem, terminal or RStudio.

rmflight commented 2 years ago

Here is an updated video where I did a reboot, and then did all the devtools::check() in RStudio itself.

https://youtu.be/Hz7KtmllKRk

For some reason, it never seems to hit the swap usage of that first one.

I've also done the same, where I reboot, and then do multiple devtools::check() in the terminal.

https://www.youtube.com/watch?v=qlPhZO5N0qA

Note for both of these I edited from 20 min down to 6, just to show the increases in swap with repetitions of running devtools::check()

If there was a way to record memory usage in some other way while running this, that would be ideal ....

gaborcsardi commented 2 years ago

Swapping is not something that is up to applications. R and R packages certainly have no say in whether and when they are swapped out to disk. I would also be very surprised if RStudio did anything about swapping. I am not even sure that it can do anything about it, probably not. In general swapping is up to kernel and not the applications.

You can, however, change the swappiness of the Linux kernel. (google to see how)

You can also list processes and their total memory size and their rss (the part that is currently in the memory) with ps, something like this should work:

ps -o pid,size,rss,command ax
rmflight commented 2 years ago

Hmmm. OK. I guess I'll have to look into this more then. Thanks.

It seriously just seems weird that I'm seeing this behavior primarily during devtools::check() in RStudio. Not any other time I'm running R code, unless I've run out of RAM.

gaborcsardi commented 2 years ago

It seems that your OS uses a non-default swappiness value: https://github.com/pop-os/default-settings/pull/58 This value is not quite optimal for something like R CMD check that does a lot of I/O. Changing that back to the Linux default 60 might improve the situation.

rmflight commented 2 years ago

I will definitely try modifying that value and see what happens.

Thanks @gaborcsardi !

rmflight commented 2 years ago

I think I might know what it is.

First run, the kernel goes, oh, look at that process, it's getting stuff from the disk and shoving a bunch of stuff into memory, great. Second, and subsequent runs, I think the kernel is going, man that process is doing a bunch of I/O again! It can't really need that memory, I'm shoving it into swap so it doesn't hog actual RAM that other processes need.

So I'd probably have to set the swappiness to 1 or something to avoid this.

I think it's also related to me being an idiot and having like 200 MB of RDS (between extdata and tests) data laying around for testing purposes, which of course all gets copied on each run of check(). So it's partly my own fault. I need to find some ways to reduce the size of the artifacts laying around in my package and whether they really need to be there, or make them external and only get tested when it's on my local machine and it knows the path.

gaborcsardi commented 2 years ago

Well, the important part of swappiness is that the memory that is not used for programs is used to cache files, making I/O faster. If you set a smaller swappiness value, that optimizes for interactive performance and keeps programs in memory, and also makes I/O slower because the kernel will throw away cached files, instead of swapping out pages that are not needed (right now) for programs.

So if you want better I/O, set it to a bigger value. I would set it to the default 60.

Your R CMD check does a lot of I/O, first copying those big files, and then working with them. R CMD check does a lot of I/O in general, opening lots of files, a lot of times the same files over and over again, so if your system optimizes against I/O, that's not great for it.

rmflight commented 2 years ago

That all makes sense. However, on my particular machine, even after changing swappiness to 60, I observed the exact same behavior, whereupon the first R CMD check used almost nothing in swap, and it was subsequent R CMD check that puts stuff into swap.