mmomtchev / node-gdal-async

Node.js bindings for GDAL (Geospatial Data Abstraction Library) with full async support
https://mmomtchev.github.io/node-gdal-async/
Apache License 2.0
129 stars 26 forks source link

Segmentation fault when trying to open an NetCDF .nc4 file #104

Closed jonjardine closed 11 months ago

jonjardine commented 11 months ago

I'm trying to extract the various layers from .nc4 files supplied by NASA - EarthData - GLDAS.

However I am encountering a segmentation fault when I try to open a downloaded file - I have confirmed that I am able to extract the layers using goal_translate, e.g.:

gdal_translate NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4".SnowDepth_tavg GLDAS_CLSM025_DA1_D.A20220808.022.tif 

Here's my (very simple) code:

import gdal from 'gdal-async';
const nc = await gdal.open("./GLDAS_CLSM025_DA1_D.A20220808.022.nc4");

I've attached the NetCDF file (I had to zip it as GitHub doesn't support uploads of that type).

Thanks for your help! GLDAS_CLSM025_DA1_D.A20220808.022.nc4.zip

mmomtchev commented 11 months ago

I do not reproduce it with the latest version on Linux, what version of gdal-async are you using and what platform?

jonjardine commented 11 months ago

Running on M1 Max MacStudio - I removed the package and reinstalled, building from source and it works now. Apologies!

jonjardine commented 11 months ago

Whilst I can now read the file I'm not seeing any raster images; there are 25 Subdatasets but I can't see how to extract them. Any help you can provide would be appreciated.

Dataset {
  geoTransform: null,
  srs: null,
  root: null,
  driver: Driver { description: 'netCDF' },
  rasterSize: { x: 512, y: 512 },
  layers: DatasetLayers {},
  bands: DatasetBands {},
  description: '/---/GLDAS/GLDAS_CLSM025_DA1_D.A20220808.022.nc'
}

From gdalinfo:

gdalinfo GLDAS_CLSM025_DA1_D.A20220808.022.nc4
Driver: netCDF/Network Common Data Format
Files: GLDAS_CLSM025_DA1_D.A20220808.022.nc4
Size is 512, 512
Metadata:
  NC_GLOBAL#comment=website: https://ldas.gsfc.nasa.gov/gldas, https://lis.gsfc.nasa.gov/
  NC_GLOBAL#conventions=CF-1.6
  NC_GLOBAL#DX=0.25
  NC_GLOBAL#DY=0.25
  NC_GLOBAL#history=created on date: 2022-10-24T22:52:47.141
  NC_GLOBAL#institution=NASA GSFC HSL
  NC_GLOBAL#MAP_PROJECTION=EQUIDISTANT CYLINDRICAL
  NC_GLOBAL#missing_value=-9999
  NC_GLOBAL#references=Li_etal_WRR_2019, Li_etal_SciRep_2019, Li_etal_GRL_2017, Rodell_etal_BAMS_2004, Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
  NC_GLOBAL#source=CLSM_F2.5/CSR_GRACE_GRACE-FO_RL06_Mascons_all-corrections_2002-04_2019-08&v02_2020_04-05/ECMWF
  NC_GLOBAL#SOUTH_WEST_CORNER_LAT=-59.875
  NC_GLOBAL#SOUTH_WEST_CORNER_LON=-179.875
  NC_GLOBAL#tavg definition:=24-hour average
  NC_GLOBAL#title=GLDAS2.2 LIS land surface model output
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4":time_bnds
  SUBDATASET_1_DESC=[1x2] time_bnds (64-bit floating-point)
  SUBDATASET_2_NAME=NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4":Swnet_tavg
  SUBDATASET_2_DESC=[1x600x1440] surface_net_downward_shortwave_flux (32-bit floating-point)
  SUBDATASET_3_NAME=NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4":Lwnet_tavg
  SUBDATASET_3_DESC=[1x600x1440] surface_net_downward_longwave_flux (32-bit floating-point)
  SUBDATASET_4_NAME=NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4":Qle_tavg
  SUBDATASET_4_DESC=[1x600x1440] surface_upward_latent_heat_flux (32-bit floating-point)
  SUBDATASET_5_NAME=NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4":Qh_tavg
  SUBDATASET_5_DESC=[1x600x1440] surface_upward_sensible_heat_flux (32-bit floating-point)
  SUBDATASET_6_NAME=NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4":Qg_tavg
  SUBDATASET_6_DESC=[1x600x1440] downward_heat_flux_in_soil (32-bit floating-point)
(etc)
mmomtchev commented 11 months ago

Sorry, I didn't see this last message.

The NetCDF format is somewhat peculiar to use: check its GDAL documentation https://gdal.org/drivers/raster/netcdf.html

When you open the main file, you must read its metadata where the subbands are located. The problem is that gdal-async only includes the fixed metadata and does not implement the interface needed for querying it. You can eventually use gdal.info to get it if you need to dynamically get them.

Then you must open each subdataset separately:

gdal.open('NETCDF:"GLDAS_CLSM025_DA1_D.A20220808.022.nc4":time_bnds')

Also, if you want to await you must use gdal.openAsync, otherwise you are still opening synchronously and blocking the event loop.

mmomtchev commented 11 months ago

TODO: Implement reading the metadata one item at a time

mmomtchev commented 11 months ago

@jonjardine The correct method for retrieving the subdatasets is

gdal.open('GLDAS_CLSM025_DA1_D.A20220808.022.nc4').getMetadata('SUBDATASETS')
jonjardine commented 11 months ago

That's awesome, many thanks for your help.