ubarsc / kealib

KEALib provides an implementation of the GDAL data model. The format supports raster attribute tables, image pyramids, meta-data and in-built statistics while also handling very large files and compression throughout.
http://kealib.org/
MIT License
12 stars 7 forks source link

Complex data support + different sized rasters #2

Open gillins opened 6 years ago

gillins commented 6 years ago

Original report by Anonymous.


Hi, I'm trying to identify a suitable HDF5 format for storing a large number of datasets with the following characteristics:

  1. Complex valued support (possible since GDAL 2.3)
  2. Number of rasters with different sizes
  3. Number of small cubes with different sizes

I have looked at kealib structure and noticed that it may not satisfy our requirements. However, I'm interested in talking to someone on the kealib team to see if any of the above features are relevant or of interest.

Thanks Piyush

gillins commented 6 years ago

Original comment by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).


We haven't had time to look at the complex number support, but would happily look at any pull requests that would enable it :smile:

Kealib only supports the features of GDAL (see http://www.gdal.org/gdal_datamodel.html). So having bands in one file with different dimensions is not supported - not sure if this is what you meant?

Cheers, Sam.

gillins commented 6 years ago

Original comment by Piyush Agram (Bitbucket: piyushrpt, GitHub: piyushrpt).


Hi Sam, I understand the implementation and design of kealib. I'm wondering if there is a possibility to extend the spec with the idea of subdatasets? For example, BAND1 etc are datasets under "root". Could they be one level lower with the possibility of treating each sub-folder under "root" as a sub-dataset? Complex number support would be the easiest to add, I dont see a problem there.

Piyush

gillins commented 6 years ago

Original comment by Pete Bunting (Bitbucket: petebunting, GitHub: petebunting).


Hi Both,

I don't think I'd be keen on extending KEA beyond the GDAL data model, without a very clear use case for example neighbours within the RAT for region growing. However, the image overviews can be at any scale and are the same datatype and number of bands as the main dataset so you could use those. How would all the header info work if you had multiple datasets for different geographic areas and/or scales or even different projections?

I think it would be good to add complex number support however so if a pull request was created with this functionality I'd be happy to look at it.

Cheers, Pete

gillins commented 6 years ago

Original comment by Piyush Agram (Bitbucket: piyushrpt, GitHub: piyushrpt).


Hi Pete, By multiple datasets, I meant possibility of optionally adding a layer like "DATASET1", "DATASET2" etc each behaving like its own kea dataset. Everything remains the same. The current use case that I'm trying to tackle is we have datasets that have been acquired with different sensors but simultaneously. It makes sense to have all the data grouped together in a single HDF5 / KEA module. There are cases when the data is resampled onto the same grid but this comes at the cost of data volume. And some of the early products in the processing pipeline are better stored at their native resolutions.

Notion of subDatasets is already supported by the HDF5 and NETCDF drivers in GDAL. I understand that its not a neat solution. I'm not proposing that the current layout be modified, only that optionally multiple datasets can be supported.

Piyush

gillins commented 6 years ago

Original comment by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).


Hi Guys,

I guess given that there is a precedent with subdatasets in other drivers we should be open to supporting it. However I've always found subdatasets somewhat messy and both Pete and I have tended to use a different KEA file for each resolution/projection etc instead (along with a decent file naming system to link them together).

My feeling is that if it is a simple change to allow subdatasets we should accept it, but if it is complex and may possibly break existing code we should avoid it. Do you have any idea what is involved in doing this?

Sam.