vpac-innovations / rsa

Storage and processing for geospatial raster data
GNU General Public License v3.0
0 stars 1 forks source link

Investigate slow NetCDF4 write speeds from rsaquery #7

Open z0u opened 10 years ago

z0u commented 10 years ago

In the latest deployment of RSA at VPAC, we noticed that large queries were running very slowly - apparently much more slowly than the previous deployment. It turns out that writing to NetCDF 3 instead of NetCDF 4 makes rsaquery run much faster for large files. We should investigate why that is.

Unfortunately that previous deployment has been destroyed to make room for the new system, so we can't compare the performance.

z0u commented 10 years ago

There is a good report called the NetCDF-4 Performance Report, produced in 2008 by the The HDF Group, that lists various benchmarking results for netCDF-4 vs netCDF-3. These bits may be of particular interest to us:

The netCDF C library provides functions to configure chunking; we could use these to try to get better performance. Apparently netCDF-4 version 4.0 had poor defaults for chunking that were fixed in 4.2 - but this shouldn't be the cause of our current problems, because our docs recommend using version 4.3.

This shows that library versions, application configuration and OS/hardware configuration could all indeed play a part in the reduced performance that we are seeing.

z0u commented 10 years ago

I have found an old set of test results that directly compare nc3 and nc4 (compressed) execution time for queries. Despite giving quite good compression (65% smaller files for continuous data; 8.8MB vs 25MB output), nc4 performed about the same as nc3 (actually marginally faster in most tests).

Just running a small informal test on my machine:

So in fact in this test netCDF 4 is a bit faster. However this is quite a small test case - not even one 5000x5000 tile in size.

z0u commented 10 years ago

The latest version of NetCDF-Java boasts better default chunking settings for netCDF-4 output, which is likely to improve performance. We could try upgrading, but note:

z0u commented 10 years ago

Actually, this might be possible to configure in the version that we're currently using. When creating a new dataset to write to, we use the ucar.nc2.NetcdfFileWriter.createNew function. By passing an extra argument we can adjust the chunking strategy.

ucar.nc2.NetcdfFileWriter.createNew(
    ucar.nc2.NetcdfFileWriter.Version,
    java.lang.String,
    ucar.nc2.jni.netcdf.Nc4Chunking)

Nc4Chunking is an interface, so we would need to implement a new class. Or maybe we can just pull in the class from NetCDF-Java 4.5 without doing a full upgrade.