nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.69k stars 620 forks source link

Azurebatch does not respect "disk" directive #5245

Closed qup9 closed 3 weeks ago

qup9 commented 3 weeks ago

Bug report

Expected behavior and actual behavior

azurebatch process containers allocate 98GB to working space regardless of configuration.

Steps to reproduce the problem

I am running Nextflow with AzureBatch. The VM's are created with an OS disk of 1000GB. The containers are created during the workflow with an working disk size of ~90GB. I need the container disks to be 200GB. I have tried:

process { executor = 'azurebatch' queue = 'arm_pool_standard_d4_v3' container = params.container errorStrategy = 'ignore'
containerOptions = '--storage-opt dm.basesize=200G' and

process { executor = 'azurebatch' queue = 'arm_pool_standard_d4_v3' container = params.container errorStrategy = 'ignore'
disk = '200.GB' and

process { executor = 'azurebatch' queue = 'arm_pool_standard_d4_v3' container = params.container errorStrategy = 'ignore'
disk = '200 GB' The result is always: Docker Container: az-pass Filesystem Size Used Avail Use% Mounted on /dev/sdb1 98G 2.5G 91G 3% /mnt/batch/tasks

How do I tell nextflow to create my container with 200GB ?

Program output

az-pass Filesystem Size Used Avail Use% Mounted on /dev/sdb1 98G 2.5G 91G 3% /mnt/batch/tasks

Environment

Additional context

My process is running fasterq-dump on NCBI accessions. Many accessions are over 5GB. fasterq-dump requires ~18 x accession size to safely complete a task.

bentsherman commented 3 weeks ago

Duplicate of #4920

For now I believe you have to select a VM type that explicitly has the desired amount of storage

qup9 commented 3 weeks ago

I think this is a needed feature and should be supported. This is supported most executors and an Google Batch. I am trying to control costs by using small cheap processors. Disk space is what I need for the process tasks.

The VM Disk is not the issue. I can control how much space a VM gets by creating the pool and specifying OS disk. This issue is how much space a container is allocated on the VM Right now, that is locked at 98GB. My VM has 1TB storage. Each containers are only allocated 98GB

From: Ben Sherman @.> Sent: Wednesday, August 21, 2024 11:55 AM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Author @.> Subject: Re: [nextflow-io/nextflow] Azurebatch does not respect "disk" directive (Issue #5245)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Duplicate of #4920https://github.com/nextflow-io/nextflow/issues/4920

For now I believe you have to select a VM type that explicitly has the desired amount of storage

— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/5245#issuecomment-2302431651, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKTS2IFSVZS32L5XHT3ZSSZ67AVCNFSM6AAAAABM4JO5LOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSGQZTCNRVGE. You are receiving this because you authored the thread.Message ID: @.**@.>>

bentsherman commented 3 weeks ago

Let's keep this discussion going in the other issue, even if it's slightly different, it's part of the larger problem we need to solve which is getting enough disk storage to the task environment