Closed hasel001 closed 4 months ago
As also mentioned in S-100, HDF5 "has built-in compression" and different strategies on how to compress the data (https://www.hdfgroup.org/2017/05/hdf5-data-compression-demystified-2-performance-tuning/).
Are these numbers based on some level of compression? If not, it may be worth to explore it.
Hi @giumas,
This is our experience about compression in HDF5.
Use the algorithm of GZIP, it seems to be everywhere by default (in the different APIs). If you choose an algorithm that is not supported everywhere, you may have problems reading the data. 1.1. From our point of view this is actually a point that should be defined in the S-102 or even better in the S-100 Part 10c.
The numbers are the usual compression levels, where 9 is the strongest compression. Level 9 is also the level that the BSH uses.
Deciding on a compression strategy is more difficult. It depends strongly on how the HDF5 files will be used. Are they a transport format and will be converted to proprietary formats on the ECDIS, e.g. SENC? Or will these formats be used directly in a read-only fashion on the ECDIS? 3.1 At BSH, we currently assume that it is a transport format. Due to the S-102, the transport size is limited to 10 MB. Furthermore, the transmission paths (e.g. satellite) are cost-intensive. Therefore, the goal should be to keep the transport size as small as possible. After the data has been received on the ECDIS system, it can still be converted to a more efficient or non-compressed format. 3.2 Our chunk size always corresponds to the complete size of the coverage. We have found that this saves the most storage space.
S-102_file_size_experiment_v2.pdf Model updated after addition of some New Zealand data. No substantive changes to model parameters.
Willing to add in more data if people want to send me their data.
As mentioned in the product spec V2.2 discussion, I reckon Table 16 is a bit out. There is the potential to add in something like the isolines graph or an alternative table (noting its a three dimensional problem: file size, grid resolution, spatial coverage)
@giumas all values are derived from h5 files exported from Caris Base Editor - no options about compression were chosen, just what the software did all by itself.
During PT13, we discussed this issue. As it merits further discussion and investigation, we will keep this issue open.
I have some additional surveys that I was going to produce trial S-102 datasets for in next few weeks, which I plan to add in to the data set.
I think more salient point however is that Table 16 is perhaps a little inaccurate.
Chair has action to ensure info is available in the Wiki. Once ensured, this issue can be closed (per PT16 decision).
From: Paul Rustomji To: S-102 Project Team Re: S-102 file size resources 1 March 2023
Attachment 1: S-102 File Size Model (pdf)
Attachment 2: S-102 File Size Isolines (pdf)
Attached are two PDF files I have derived to answer my own questions about how big S-102 file sizes are likely to be. Here in Aus we are trying to scheme up some S-102 test data sets but are unlikely at this stage to follow a regular gridded product boundary model. More likely will be odd shapes matching particular user needs (e.g., choke points or difficult to navigate pilotage areas). But need to be cognisant of what volumes of data are associated with different S-102 cell resolution/areal extent combinations and what can be managed in the maritime world (still talking in the low 1-30 MB range rather than hundreds of MB range, I presume).
So using one test data set, I’ve exported Caris CSAR files (with an uncertainty band of fixed value of 5.15 m) to .h5 format for a fixed geographic area, after regridding at progressively coarser resolutions. There is (as expected) a log-log linear relationship between area specific file size and grid resolution. Which I have then back calculated areal coverage for a given file size and resolution combination (see the file size isolines pdf).
It's in metric units; sorry.
Paul