locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.33k stars 360 forks source link

GDALRasterSource will read corrupt jp2 successfully #3547

Open bossie opened 1 month ago

bossie commented 1 month ago

Describe the bug

I can create a GDALRasterSource from a corrupt JPEG2000 file and successfully read from it. I would expect it to throw an exception instead of silently proceeding with wrong data.

To Reproduce

import geotrellis.raster.gdal.GDALRasterSource
import geotrellis.raster.io.geotiff.MultibandGeoTiff

val rs = GDALRasterSource("https://artifactory.vgt.vito.be/artifactory/testdata-public/T29UMV_20180327T114351_B04_10m.jp2")
val Some(raster) = rs.read()
MultibandGeoTiff(raster, rs.crs).write("/tmp/rasterSourceFromCorruptTile.tif")

This will successfully read a corrupt JPEG2000 file and write it to a GeoTiff that looks funky. GDAL will output erorr messages along the way look like this:

[1 of 1000] FAILURE(3) CPLE_AppDefined(1) "Application defined error." Inconsistent marker size
[2 of 1000] FAILURE(3) CPLE_AppDefined(1) "Application defined error." opj_get_decoded_tile() failed 

Expected behavior

I would expect it to throw an exception instead of silently proceeding with wrong data. gdalinfo for example, will output error messages instead:

$ gdalinfo -stats https://artifactory.vgt.vito.be/artifactory/testdata-public/T29UMV_20180327T114351_B04_10m.jp2
...
ERROR 1: Stream too short, expected SOT

ERROR 1: opj_get_decoded_tile() failed
ERROR 1: /vsimem/http_1/T29UMV_20180327T114351_B04_10m.jp2, band 1: IReadBlock failed at X offset 5, Y offset 8: opj_get_decoded_tile() failed

Environment

pomadchin commented 1 month ago

Hey @bossie thx for reporting! Are u sure that's the GeoTrellis issue and not the underlying bindings? Have u tried to reproduce it via the bare bindings code?

pomadchin commented 1 month ago

I wonder if smth could be swallowed on the bindings lvl.

bossie commented 1 month ago

Hi @pomadchin. Thanks for the quick response.

Hey @bossie thx for reporting! Are u sure that's the GeoTrellis issue and not the underlying bindings? Have u tried to reproduce it via the bare bindings code?

I have not so it's entirely possible that it's within the bindings code but I'm not familiar with that.

What I initially tried was significantly lowering geotrellis.raster.gdal.number-of-attempts because that seemed to make it fail but:

So I guess that's unrelated and I'm on the wrong track here.

pomadchin commented 1 month ago

Oooh interesting. Yest, that's definitely the bindings thing: https://github.com/geotrellis/gdal-warp-bindings/blob/v3.9.0/src/com_azavea_gdal_GDALWarp.c

That's been a way to fight with the not nicely fited for the JVM parallelism GDAL Datasets access.

Check the completely opposite scenarios: