nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
https://nf-co.re/modules
MIT License
276 stars 686 forks source link

[FEATURE] Disabling JVM Hotspot in modules for JAVA tools #3455

Open lfearnley opened 1 year ago

lfearnley commented 1 year ago

Is your feature request related to a problem? Please describe

I've encountered a problem with the JVM Hotspot for GATK processes when multiple GATK processes are run on the same node in singularity containers (details in nf-sarek issue #1030). There's also a recent Sarek issue with SIGBUS errors related to Hotspot (nf-sarek issue #1024).

Describe the solution you'd like

I'd like to proposed turning HotSpot off using -XX:-UsePerfData in the --java-options passed to GATK.

This has two effects - it should eliminate a class of bugs related to the JVM and hsperfdata, as well as stabilising nf-core Singularity modules in rare and hard-to-debug situations.

Describe alternatives you've considered

Hotspot is hard-coded in the JVM to write files to /tmp. It ignores the --tmp-dir flag passed to GATK.

As far as I can tell turning this off has no negative side effects beyond preventing the use of jstat and certain Java debuggers which don't seem to be used in nf-core. This detailed blog post from Evan Jones describes an improvement to Java GC efficiency from turning this system off.

Alternatives would include preventing singularity from mounting host /tmp into the container (I'm not certain how this might be achieved within nf-core), or using -XX:+PerfDisableSharedMem.

Additional context

I'm currently trialling nf-sarek with the -XX:-UsePerfData java option on ~100 human WGS and will update on stability.

lfearnley commented 1 year ago

Disabling JVM hotspot works to patch these out for GATK, but this can also be triggered by other some Java applications (such as picard commands run in nf-raredisease) are also causing this behaviour. -XX:-UsePerfData is stable in my experience across ~200 runs of Sarek.

maxulysse commented 1 year ago

ok, so picard should be patched as well, I'll do that in a separate PR then...

lfearnley commented 1 year ago

It may also be an issue for fastqc. It's happening to others so the patches are incredibly useful (https://github.com/nf-core/sarek/issues/1030), but I'm wondering if this is worth tagging with the nextflow devs as it seems to be a common issue.

maxulysse commented 1 year ago

Changed the name of the issue and kept it open, so that we can track other JAVA tools. all gatk4 modules have been patched (cf #3844), and we have a PR in sarek; https://github.com/nf-core/sarek/pull/1240

lfearnley commented 1 year ago

Great, thanks!

I'm trying out setting the _JAVA_OPTS environment variable for fastqc, which seems promising so far.

On Mon, 18 Sept 2023, 5:10 pm Maxime U Garcia, @.***> wrote:

Changed the name of the issue and kept it open, so that we can track other JAVA tools. all gatk4 modules have been patched (cf #3844 https://github.com/nf-core/modules/pull/3844), and we have a PR in sarek; nf-core/sarek#1240 https://github.com/nf-core/sarek/pull/1240

— Reply to this email directly, view it on GitHub https://github.com/nf-core/modules/issues/3455#issuecomment-1722860130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC25LCIPK2P7OLWUZEXQEPDX27XWVANCNFSM6AAAAAAYOBKJSU . You are receiving this because you authored the thread.Message ID: @.***>

clbenoit commented 5 months ago

I attempted to set up JAVA_TOOLS_OPTIONS and JAVA_OPTS in fgbio processes, but it did not resolve the issue. Fortunately, fgbio accepts direct parsing of-XX:-UsePerfData.

lfearnley commented 5 months ago

For completeness, you may need to set '_JAVA_OPTIONS' as well as 'JAVA_TOOLS_OPTIONS' and 'JAVA_OPTS'; https://stackoverflow.com/questions/28327620/difference-between-java-options-java-tool-options-and-java-opts has some more details on this.