kapelner / bartMachine

An R-Java Bayesian Additive Regression Trees implementation
MIT License
62 stars 27 forks source link

bartMachine using more than 100% CPU even with set_bart_machine_num_cores(1) #56

Closed rdiaz02 closed 9 months ago

rdiaz02 commented 9 months ago

This isn't really a bug, but maybe worth pointing it out in the documentation or the README?

I am running multiple bartMachines in parallel (with package SuperLearner: https://cran.r-project.org/web/packages/SuperLearner/index.html) in a snow cluster. To avoid running more threads than cores I have I was setting set_bart_machine_num_cores(1) and yet, top was showing CPU usages per process much larger than 100% (reaching at times 200% or 300%), leading to CPU overloading.

It turns out that doing Sys.setenv(JAVA_TOOL_OPTIONS = "-XX:ParallelGCThreads=1") in both the master and the slaves (i.e., clusterEvalQ(your_cluster_name, {Sys.setenv(JAVA_TOOL_OPTIONS = "-XX:ParallelGCThreads=1")}) seems to fix the issue. (Java ignorant hee, so this was trial and error and googling around).

Leaving it here for anyone else how might stumble upon it. I can create a PR to the README if deemed appropriate.

kapelner commented 9 months ago

Nice find! I would love a PR to the readme.

On Wed, Feb 21, 2024, 16:19 Ramon Diaz-Uriarte @.***> wrote:

This isn't really a bug, but maybe worth pointing it out in the documentation or the README?

I am running multiple bartMachines in parallel (with package SuperLearner: https://cran.r-project.org/web/packages/SuperLearner/index.html) in a snow cluster. To avoid running more threads than cores I have I was setting set_bart_machine_num_cores(1) and yet, top was showing CPU usages per process much larger than 100% (reaching at times 200% or 300%), leading to CPU overloading.

It turns out that doing Sys.setenv(JAVA_TOOL_OPTIONS = "-XX:ParallelGCThreads=1") in both the master and the slaves (i.e., clusterEvalQ(your_cluster_name, {Sys.setenv(JAVA_TOOL_OPTIONS = "-XX:ParallelGCThreads=1")}) seems to fix the issue. (Java ignorant hee, so this was trial and error and googling around).

Leaving it here for anyone else how might stumble upon it. I can create a PR to the README if deemed appropriate.

— Reply to this email directly, view it on GitHub https://github.com/kapelner/bartMachine/issues/56, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFAV6DKYWHUYNELVP4TF6DYUZQE5AVCNFSM6AAAAABDTZA4HWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DONZTGAZDGMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rdiaz02 commented 9 months ago

Done: https://github.com/kapelner/bartMachine/pull/57

(I wasn't sure where to put the paragraph. Of course, move wherever it feels more appropriate).

kapelner commented 9 months ago

Thanks again.

On Fri, Feb 23, 2024, 14:07 Ramon Diaz-Uriarte @.***> wrote:

Done: #57 https://github.com/kapelner/bartMachine/pull/57

(I wasn't sure where to put the paragraph. Of course, move wherever it feels more appropriate).

— Reply to this email directly, view it on GitHub https://github.com/kapelner/bartMachine/issues/56#issuecomment-1961848810, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFAV6D7QMICCQAQIFT2AXLYVDSG7AVCNFSM6AAAAABDTZA4HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRHA2DQOBRGA . You are receiving this because you commented.Message ID: @.***>

rdiaz02 commented 9 months ago

Thank you! Closing the issue.