nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.71k stars 621 forks source link

hyperqueue (>=0.17.0) and Nextflow interpret memory resource differently #5093

Closed bguo068 closed 3 months ago

bguo068 commented 3 months ago

Bug report

Expected behavior and actual behavior

Expected behavior: hyperqueue and nextflow should have same interpretation of memory resources.

Actual behavior: When set memory 2.GB in Nextflow, the .command.run a work directory uses this:

#HQ --resource mem=2147483648

This is problematic as newer versions of hyperqueue interpret 2147483648 as 2147483648 Mb (2 PB) instead of 2147483648 Bytes(2GB). This results in the hyperqueue job waiting for the a big memory resource that would never be possible for most HPCs.

Steps to reproduce the problem

main.nf:

process A {
        executor 'hq'
        cpus 1
        memory 2.GB
        input: val(x)
        output: path("*.txt")
        script:
        """
        echo ${x} > ${x}.txt
        """
}

workflow{
        channel.fromList([1,2]) | A
}

Run the pipeline:

nextflow main.nf

Program output

Nextflow output:

 N E X T F L O W   ~  version 24.04.2

Launching `main.nf` [scruffy_cantor] DSL2 - revision: 091b7ae540

WARN: The support for HyperQueue is an experimental feature and it may change in a future release
executor >  hq (2)
[8e/ff2537] A (2) | 0 of 2                                  <--------------- Never finish

And it will stuck forever.

Example .command.run file:

#!/bin/bash
#HQ --name nf-A_(2)
# ....
#HQ --resource mem=2147483648               <------------------
#HQ --cpus 1
....

Environment

Additional context

break change in hyperqueue about mem resource

bguo068 commented 3 months ago

Workaround for now:


process A {
        executor 'hq'
        cpus 1
        memory 2048   //  <---------- Change 2.GB to 2048, to allow hq interpret it as 2048MB
        input: val(x) //    Caveats: if changing executor to sge/slurm, 
        output: path("*.txt") // nextflow will interpret it as 2048bytes
        script:
        """
        echo ${x} > ${x}.txt
        """
}

workflow{
        channel.fromList([1,2]) | A
}
pditommaso commented 3 months ago

Think it's related to this PR https://github.com/nextflow-io/nextflow/pull/4884

bentsherman commented 3 months ago

I suggest we merge that PR and update the docs to say that Hyperqueue >=0.17.0 is required in the next edge release

bguo068 commented 3 months ago

I suggest we merge that PR and update the docs to say that Hyperqueue >=0.17.0 is required in the next edge release

That would be great!

pditommaso commented 3 months ago

Solved via b45f7c48