nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
https://nf-co.re/modules
MIT License
279 stars 696 forks source link

[FEATURE] Use Groovy to unpack compressed tar.gz files #4044

Open adamrtalbot opened 1 year ago

adamrtalbot commented 1 year ago

Is your feature request related to a problem? Please describe

We sometimes need to run UNTAR, GZIP, GUNZIP etc just to unpack a file before running a tool. Let's use Native groovy with nf-test to do this.

Describe the solution you'd like

Grooooooovy wizardry.

Describe alternatives you've considered

Importing the module everywhere.

Additional context

No response

edmundmiller commented 1 year ago

image https://github.com/askimed/nf-test/issues/72

robsyme commented 1 year ago

This is an example of streaming a remote tgz file into a TarInputStream (from the external dependency org.apache.ant). This can probably be cleaned up quite a bit, but at least using streams allows us to avoid writing the uncompressed data to disk:

@Grab(group='org.apache.ant', module='ant', version='1.10.13')
import java.nio.file.Files
import java.nio.file.Paths
import java.util.zip.GZIPInputStream
import org.apache.tools.tar.TarInputStream

params.input = "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/sra/SRR13255544.tar.gz"

url = new URL(params.input)
input = url.openConnection().getInputStream()
tarStream = new TarInputStream(new BufferedInputStream(new GZIPInputStream(input)))

while ((entry = tarStream.getNextEntry()) != null) {
    Path extractTo = Paths.get(".").resolve(entry.getName());
    if (entry.isDirectory()) {
        Files.createDirectories(extractTo);
    } else {
        Files.copy(tarStream, extractTo);
    }
}