onthegomap / planetiler

Flexible tool to build planet-scale vector tilesets from OpenStreetMap data fast
Apache License 2.0
1.21k stars 101 forks source link

[BUG] ZIP_THRESHOLD_SIZE exceeded with Large GeoPackage Files #940

Closed vycius closed 6 days ago

vycius commented 1 week ago

Describe the bug ZIP_THRESHOLD_SIZE can easily be exceeded with larger GeoPackage files.

To Reproduce Steps to reproduce the behavior:

  1. Create example.java using addGeoPackageSource with a URL pointing to a larger zip archive:
    
    import com.onthegomap.planetiler.FeatureCollector;
    import com.onthegomap.planetiler.ForwardingProfile;
    import com.onthegomap.planetiler.Planetiler;
    import com.onthegomap.planetiler.Profile;
    import com.onthegomap.planetiler.config.Arguments;
    import com.onthegomap.planetiler.reader.SourceFeature;

import java.nio.file.Path;

class PolygonProcessor implements Profile {

@Override
public void processFeature(SourceFeature sourceFeature, FeatureCollector features) {

}

}

public class Example extends ForwardingProfile { public static void main(String[] args) throws Exception { var arguments = Arguments.fromArgs(args).withDefault("download", true);

    Planetiler.create(arguments).setProfile(new PolygonProcessor())
            .addGeoPackageSource(
                    "large",
                    Path.of("data", "example-polygons.gpkg.zip"),
                    "https://cdn.startupgov.lt/tiles/poc/planetiler/example-polygons.gpkg.zip"
            )
            .overwriteOutput(Path.of("data", "output", "example.pmtiles"))
            .run();

}

}

2. Run the code using any Java version with Planetiler version 0.8.1.
3. Observe the error:

Exception in thread "main" com.onthegomap.planetiler.Planetiler$PlanetilerException: Error occurred during stage large at com.onthegomap.planetiler.Planetiler.run(Planetiler.java:829) at lt.example.main(Example.java:32) Caused by: java.io.UncheckedIOException: java.io.IOException: The uncompressed data size 1GB is too much for the application resource capacity at com.onthegomap.planetiler.util.FileUtils.safeCopy(FileUtils.java:288) at com.onthegomap.planetiler.reader.GeoPackageReader.openGeopackage(GeoPackageReader.java:77) at com.onthegomap.planetiler.reader.GeoPackageReader.(GeoPackageReader.java:59) at com.onthegomap.planetiler.reader.GeoPackageReader.lambda$process$0(GeoPackageReader.java:108) at com.onthegomap.planetiler.reader.SourceFeatureProcessor.getFeatureCount(SourceFeatureProcessor.java:159) at com.onthegomap.planetiler.reader.SourceFeatureProcessor.processFiles(SourceFeatureProcessor.java:78) at com.onthegomap.planetiler.reader.SourceFeatureProcessor.processFiles(SourceFeatureProcessor.java:65) at com.onthegomap.planetiler.reader.GeoPackageReader.process(GeoPackageReader.java:105) at com.onthegomap.planetiler.Planetiler.lambda$addGeoPackageSource$4(Planetiler.java:404) at com.onthegomap.planetiler.Planetiler.lambda$ifSourceUsed$12(Planetiler.java:955) at com.onthegomap.planetiler.Planetiler.run(Planetiler.java:827) ... 1 more Caused by: java.io.IOException: The uncompressed data size 1GB is too much for the application resource capacity at com.onthegomap.planetiler.util.FileUtils.safeCopy(FileUtils.java:283) ... 11 more


**Expected behavior**
Larger GeoPackage files should be processed successfully. However, the current hardcoded `ZIP_THRESHOLD_SIZE` of 1 GB in [FileUtils.java](https://github.com/onthegomap/planetiler/blob/main/planetiler-core/src/main/java/com/onthegomap/planetiler/util/FileUtils.java#L41) prevents this.

The current `ZIP_THRESHOLD_SIZE` is insufficient for GeoPackage files, as a single GeoPackage file can contain many different features and can easily exceed this limit.

**Possible Solutions:**
1. **Increase `ZIP_THRESHOLD_SIZE`:** Setting a higher limit, such as 10 GB or 25 GB, would likely be sufficient for most use cases (though theoretically, a GeoPackage might be up to 140TB).
2. **User-Configurable `ZIP_THRESHOLD_SIZE`:** Allow users to specify the `ZIP_THRESHOLD_SIZE` in the Planetiler configuration.

**Environment**
OS: MacOS 14.5 

openjdk 22.0.1 2024-04-16 OpenJDK Runtime Environment Homebrew (build 22.0.1) OpenJDK 64-Bit Server VM Homebrew (build 22.0.1, mixed mode, sharing)

msbarry commented 6 days ago

Seems reasonable to increase the limit. This is meant to protect against a zip bomb that fills up all available disk space, but we expect these files to be pretty big. Would 100gb work?

vycius commented 6 days ago

Yes, increasing the limit to 100GB is reasonable.