nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.61k stars 605 forks source link

Failed to publish: large file transfer to GCS #5085

Open nick-youngblut opened 1 week ago

nick-youngblut commented 1 week ago

Bug report

My team keeps receiving "Failed to publish" warnings when running our custom Nextflow pipeline. For example:

Jun-17 20:16:08.765 [PublishDir-1982] DEBUG nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement1-Plate1-Col3_Media20_P3_R1_LibF18_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement1-Plate1-Col3_Media20_P3_R1_LibF18_R2_001.fastq.gz [copy] -- attempt: 1; reason: All 0 retries failed. Waited a total of 0 ms between attempts
Jun-17 20:16:08.765 [PublishDir-2110] DEBUG nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col6_Media12_P3_R2_LibK18_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col6_Media12_P3_R2_LibK18_R2_001.fastq.gz [copy] -- attempt: 1; reason: All 0 retries failed. Waited a total of 0 ms between attempts
Jun-17 20:16:08.765 [PublishDir-2136] DEBUG nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col9_Media12_P3_R1_LibA19_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col9_Media12_P3_R1_LibA19_R2_001.fastq.gz [copy] -- attempt: 1; reason: Broken pipe

For v23, the resulting files (published to Google Cloud Storage from a local Linux server) are corrupted (partial files). For v24, Nextflow throws an error prior to completing the publishing process. This issue only seems to occur during publishing of many large files in parallel (~2-4 Tb).

Expected behavior and actual behavior

See above.

Steps to reproduce the problem

Use publishDir with >2 Tb of files transferred from an Ubuntu server to Google Cloud Storage. This seems to be a bandwidth issue. Note that the server has plenty of resources (128 cores and 750 Gb memory dedicate to just this Nextflow pipeline).

Program output

Jun-17 20:16:08.769 [PublishDir-2140] ERROR nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col9_Media19_P3_R2_LibA20_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col9_Media19_P3_R2_LibA20_R2_001.fastq.gz [copy] -- See log file for details
java.lang.OutOfMemoryError: Java heap space
Jun-17 20:16:08.768 [PublishDir-1166] ERROR nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Empty-Plate2-E12_Media20_P3_R1_LibJ24_R1_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Empty-Plate2-E12_Media20_P3_R1_LibJ24_R1_001.fastq.gz [copy] -- See log file for details
java.lang.OutOfMemoryError: Java heap space
Jun-17 20:16:08.772 [PublishDir-2101] ERROR nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col5_Media12_P3_R2_LibI18_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col5_Media12_P3_R2_LibI18_R2_001.fastq.gz [copy] -- See log file for details
java.lang.OutOfMemoryError: Java heap space
    at java.base/sun.nio.ch.Net.localAddress(Net.java:625)
    at java.base/sun.nio.ch.NioSocketImpl.endConnect(NioSocketImpl.java:529)
    at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:604)
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
    at java.base/java.net.Socket.connect(Socket.java:751)
    at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304)
    at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:531)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:636)
    at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
    at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:377)
    at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1237)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1123)
    at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179)
    at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:141)
    at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:151)
    at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
    at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:525)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:466)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:576)
    at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:509)
    at com.google.cloud.storage.StorageImpl.lambda$get$6(StorageImpl.java:285)
    at com.google.cloud.storage.StorageImpl$$Lambda/0x00007f78f84925b0.call(Unknown Source)
    at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
    at com.google.cloud.storage.Retrying.run(Retrying.java:54)
    at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1406)
    at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:284)
    at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:290)
Jun-17 20:16:08.768 [PublishDir-758] ERROR nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_89-Desulfovibrio-piger-ATCC-29098_Media20_P3_R3_LibN9_R1_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_89-Desulfovibrio-piger-ATCC-29098_Media20_P3_R3_LibN9_R1_001.fastq.gz [copy] -- See log file for details
java.lang.OutOfMemoryError: Java heap space
Jun-17 20:16:08.768 [PublishDir-2080] ERROR nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col2_Media20_P3_R2_LibC18_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate1-Col2_Media20_P3_R2_LibC18_R2_001.fastq.gz [copy] -- See log file for details
java.lang.OutOfMemoryError: Java heap space
Jun-17 20:16:08.770 [PublishDir-1927] ERROR nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_91-Akkermansia-muciniphila-ATCC-BAA-835_Media19_P3_R2_LibG6_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_91-Akkermansia-muciniphila-ATCC-BAA-835_Media19_P3_R2_LibG6_R2_001.fastq.gz [copy] -- See log file for details
java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Arrays.copyOf(Arrays.java:3541)
    at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:135)
    at com.google.cloud.storage.contrib.nio.CloudStorageWriteChannel.write(CloudStorageWriteChannel.java:68)
    at java.base/sun.nio.ch.FileChannelImpl.transferToArbitraryChannel(FileChannelImpl.java:729)
    at java.base/sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:787)
    at java.base/sun.nio.ch.ChannelInputStream.transfer(ChannelInputStream.java:283)
    at java.base/sun.nio.ch.ChannelInputStream.transferTo(ChannelInputStream.java:250)
    at java.base/java.nio.file.Files.copy(Files.java:3151)
    at nextflow.file.CopyMoveHelper.copyFile(CopyMoveHelper.java:91)
    at nextflow.file.CopyMoveHelper.copyToForeignTarget(CopyMoveHelper.java:172)
    at nextflow.file.FileHelper.copyPath(FileHelper.groovy:962)
    at nextflow.processor.PublishDir.processFileImpl(PublishDir.groovy:508)
    at nextflow.processor.PublishDir.processFile(PublishDir.groovy:421)
    at java.base/java.lang.invoke.LambdaForm$DMH/0x00007f78f84a9000.invokeVirtual(LambdaForm$DMH)
    at java.base/java.lang.invoke.LambdaForm$MH/0x00007f78f874cc00.invoke(LambdaForm$MH)
    at java.base/java.lang.invoke.LambdaForm$MH/0x00007f78f8192400.invokeExact_MT(LambdaForm$MH)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:155)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
    at groovy.lang.MetaClassImpl.doInvokeMethod(MetaClassImpl.java:1333)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1088)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
    at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:645)
    at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:628)
    at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:82)
    at nextflow.processor.PublishDir$_retryableProcessFile_closure2.doCall(PublishDir.groovy:398)
    at java.base/java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(DirectMethodHandle$Holder)
    at java.base/java.lang.invoke.LambdaForm$MH/0x00007f78f874c400.invoke(LambdaForm$MH)
    at java.base/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:154)
Jun-17 20:16:08.791 [PublishDir-1905] DEBUG nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_89-Desulfovibrio-piger-ATCC-29098_Media19_P3_R1_LibM9_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_89-Desulfovibrio-piger-ATCC-29098_Media19_P3_R1_LibM9_R2_001.fastq.gz [copy] -- attempt: 1; reason: All 0 retries failed. Waited a total of 0 ms between attempts
Jun-17 20:16:08.808 [PublishDir-2158] DEBUG nextflow.processor.PublishDir - Failed to publish file: /home/khushalip/auto-demux/tmp/work/20231220_LH00181_0010_A22FMTGLT3/7b/02f7a9bdae36a94babec6e07e0fe55/bcl_output/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate2-Row11_Media19_P3_R2_LibE24_R2_001.fastq.gz; to: gs://illumina-auto-demux/NovaSeqX/20231220_LH00181_0010_A22FMTGLT3/bcl-convert/CZBMI_NICHE_StrainDropout/StrainDropout_Arrangement2-Plate2-Row11_Media19_P3_R2_LibE24_R2_001.fastq.gz [copy] -- attempt: 1; reason: All 0 retries failed. Waited a total of 0 ms between attempts

Environment

Additional context

I tried to initially post on Slack, but did not receive an answer.

nick-youngblut commented 1 week ago

Maybe adjusting NXF_OPTS is the solution?