Implement header chunk read

mapbox / COGDumper

Dumps tiles out of a cloud optimized geotiff

MIT License

64 stars 14 forks source link

Implement header chunk read #5

Closed vincentsarago closed 6 years ago

vincentsarago commented 6 years ago

closes #4 This PR try to implement chunk reading for header retrieval to limit the number of read calls

To Do

Right now I'm reading the first 8000 bytes of a file and decode the header. It works for normal TIFF but not for BIGTIFF. @normanb do you now if there is a size limit for GeoTIFF headers ?

normanb commented 6 years ago

afaik there isn't a limit on the header size for either tiff or bigtiff, as the header can have a flexible number of tags and these tags can have a different number of values. Picking a large enough chunk size and being prepared to go back for more seems to be an approach. Ideally chunk size should be configurable.

vincentsarago commented 6 years ago

I update most of the change and it should work now. I need to add more test to make sure everything is fine

cc @normanb

vincentsarago commented 6 years ago

@normanb I'm not sure I can make test coverage better. I think this PR is ready for final review. Just noting that by default we fetch the first 1024 bytes https://github.com/mapbox/COGDumper/blob/35de35bd4bfa929a72674755e17521065eb6c950/cogdumper/cog_tiles.py#L217 while GDAL fetch the first 16k. We could change that to make it more GDAL like and/or are a environment variable to define the size of the first fetch.

vincentsarago commented 6 years ago

@normanb in https://github.com/mapbox/COGDumper/pull/5/commits/7d9849258fa86649b52cea3cc3f106ea266ae72c I added a environment variable COG_INGESTED_BYTES_AT_OPEN used to configure the size of the first read as GDAL 2.3 is doing http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsicurl.

I also added a logging option to 👀 the number and size read.

cogdumper file --file cog_deflate.tif --xyz 0 0 0  -v
INFO:cogdumper.filedumper:Reading bytes: 0 to 16383
INFO:cogdumper.filedumper:Reading bytes: 11434145 to 11831775

I guess if this is 👌 you can do a final review and merge it :-)

normanb commented 6 years ago

+1 Thanks @vincentsarago for these changes, it greatly improves performance.

vincentsarago commented 6 years ago

@normanb should we ship a new release with the change implemented here ?