remicres / otbtf

Deep learning with otb (mirror of https://forgemia.inra.fr/orfeo-toolbox/otbtf)
Apache License 2.0
161 stars 39 forks source link

Concatenate images via Tensorflow rather than OTB #85

Closed daspk04 closed 2 years ago

daspk04 commented 2 years ago

Hello @remicres ,

Recently I was trying to concatenate around 250 Images (NDVI int16 bit) of one Tile of Sentinel2. But more or less I was getting errors as ERROR 1: WriteEncodedTile/Strip() failed.. I was using an extended file name as follows: ?&gdal:co:COMPRESS=DEFLATE&gdal:co:TILED=YES&gdal:co:BLOCKXSIZE:64&gdal:co:BLOCKYSIZE:64. I have tried with block sizes of 128 and 256 as well. I have 64 GB RAM and I allocated 16GB for OTB and the 8 Threads. I also tried a bit with STREAMING options via extended filename. I kind of get the same error. I wonder what might be the issue, is it due to CPU or ram.?

I was wondering if we can do concatenation of images via using the Tensorflow model via pyotb.? But when we save the images it will be done via OTB right.? Not sure if that will be a bottleneck again.

Any suggestion would be helpful.

remicres commented 2 years ago

Hi @Pratyush1991 ,

I did once have these kind of errors. They are related to the writing of the geotiff file.

They come from GDAL (not OTBTF, not even OTB).

I believe that I have ended to disable the compression for very large files. Maybe I also had a workaround setting TILED=YES with BIGTIFF=YES, but I don't quite sure remember!

Anyway, the GDAL version in the current OTBTF docker image is quite old (same as OTB version) and we are on the way to move to recent versions very soon

daspk04 commented 2 years ago

Thank you @remicres . Using BIGTIFF=YES solved the issue. I was able to stack them all. I will give it a try in the new version as well once it is updated. Final extended filename that I used: &gdal:co:COMPRESS=DEFLATE&gdal:co:TILED=YES&gdal:co:BLOCKXSIZE:512&gdal:co:BLOCKYSIZE:512&gdal:co:BIGTIFF=YES

Also, just one more question when should one be using streaming options in the extended file name? I read through the documents here as I understand it might be more suited for my case.?

remicres commented 2 years ago

I am glad that the GDAL creation options have helped.

Regarding the streaming options in the extended filename, generally the rule of thumb is to use the same layout (i.e. tiles or strips) for processing than for writing. For many processes where I/O is the bottleneck, processing and writing in the same fashion as the inputs are, is the better (i.e. if your inputs are encoded in strips, then you want to process and write the output as strips too). This is because you read, process, and write the same bulk of data without reordering it. But sometimes, this is not true (e.g. when blocks do not overlap efficiently, for instance if you mosaic tiled non-aligned images).

Regarding deep leaning, CNNs needs to process a slightly larger input region than the output region. The more efficient way is hence to process square regions (if you use strips, you end up with a large input region, for a tiny processed output). Tiling is then more appropriate than strips for spatial CNN. Since OTBTF TensorflowModelServe exposes the tile hint which matches the network expression field in the output image metadata, OTB writers are able to use this information to process the image the right way (i.e., in a tiled fashion with appropriate size). However, when it comes to raster writing, the risk in letting strip write mode (default) is that the writers trigger multiple time the pipeline over the same tile, to generate different strips of the output. I think that the ram parameters controls the memory footprint, and setting a big value as default should avoid this kind of thing, but I prefer specifying explicitely the tiling layout, which is also used to write the image (this way, no doubt). And for CNN, the I/O is generally not the bottleneck.

I would say that, for CNNs with small receptive field/expression field, you can omit extended filename for streaming and GDAL creation option. But when you go with heavier stuff, I think it is better to set both explicitely. This really depend on the incriminated network!

daspk04 commented 2 years ago

@remicres Thank you so much. This information is really helpful, I learned something new and understand a bit better.