projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
1.04k stars 130 forks source link

[Bug]: Export: Generic objects write breaks ZIP file exports #9944

Closed snazy closed 3 days ago

snazy commented 3 days ago

What happened

User reports this failure when running Nessie export:

java.util.zip.ZipException: no current ZIP entry
    at java.base/java.util.zip.ZipOutputStream.write(ZipOutputStream.java:357)
    at org.projectnessie.versioned.transfer.files.ZipArchiveExporter$NonClosingOutputStream.write(ZipArchiveExporter.java:116)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:125)
    at java.base/java.io.BufferedOutputStream.implFlush(BufferedOutputStream.java:252)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:240)
    at org.projectnessie.versioned.transfer.SizeLimitedOutput.finishCurrentFile(SizeLimitedOutput.java:85)
    at org.projectnessie.versioned.transfer.SizeLimitedOutput.finish(SizeLimitedOutput.java:78)
    at org.projectnessie.versioned.transfer.ExportContext.finish(ExportContext.java:74)
    at org.projectnessie.versioned.transfer.ExportCommon.exportRepo(ExportCommon.java:119)
    at org.projectnessie.versioned.transfer.NessieExporter.exportNessieRepository(NessieExporter.java:239)
    at org.projectnessie.tools.admin.cli.ExportRepository.export(ExportRepository.java:211)
    at org.projectnessie.tools.admin.cli.ExportRepository.call(ExportRepository.java:159)
    at org.projectnessie.tools.admin.cli.ExportRepository.call(ExportRepository.java:44)
    at picocli.CommandLine.executeUserObject(CommandLine.java:2045)

How to reproduce it

Unclear how to reproduce with a full integration test case, but adding this test reproduces the issue, as it mimics the behavior introduced by #9034:

public abstract class AbstractExportImport {
...
  @Test
  public void fileSupplierMixed(@TempDir Path targetDir, @NessiePersist Persist persist)
      throws Exception {
    repositoryLogic(persist).initialize("main");

    try (var exportFileSupplier = prepareExporter(targetDir)) {
      var exporter =
          NessieExporter.builder()
              .exportFileSupplier(exportFileSupplier)
              .persist(persist)
              .fullScan(true)
              .build();

      var exportFiles = exporter.exportFileSupplier();

      var exportContext =
          new ExportContext(
              exportFiles,
              exporter,
              ExportMeta.newBuilder()
                  .setNessieVersion(NessieVersion.NESSIE_VERSION)
                  .setCreatedMillisEpoch(System.currentTimeMillis())
                  .setVersion(ExportVersion.V3));

      exportContext.writeCommit(TransferTypes.Commit.newBuilder().build());
      exportContext.writeGeneric(TransferTypes.RelatedObj.newBuilder().build());
      exportContext.writeRef(TransferTypes.Ref.newBuilder().build());

      exportContext.finish();
    }
  }

Nessie server type (docker/uber-jar/built from source) and version

Affected: Nessie 0.92.1 and newer

Additional information

Workaround: Use file (directory) based exporter (--output-format=DIRECTORY)