nasa / EMIT-Data-Resources

This repository provides guides, short how-tos, and tutorials to help users access and work with data from the Earth Surface Mineral Dust Source Investigation (EMIT) mission.
Apache License 2.0
137 stars 81 forks source link

Problem with reading NASA PACE data #57

Closed giswqs closed 5 months ago

giswqs commented 6 months ago

Thank you for the excellent notebook examples from the repo. @bingqing-liu and I are building a Python package called HyperCoast for visualizing NASA Hyperspectral data interactively with a few lines of code. Here is a notebook example for visualizing EMIT data interactively. However, we run into issues with reading PACE data. The latitude and longitude coordinates are in the format of (latutude, longitude). We need to turn them into one dimension like the EMIT dataset shown in the example below. The goal is the turn the xarray dataset into a rasterio dataset so that we can plot it on the interactive map.

Any advice? @alexgleith @betolink @BriannaLind

PACE

Code snippet

import xarray as xr

# download from https://github.com/opengeos/datasets/releases/download/netcdf/PACE_OCI.20240423T184658.L2.OC_AOP.V1_0_0.NRT.nc
filepath = 'data/PACE_OCI.20240423T184658.L2.OC_AOP.V1_0_0.NRT.nc'
ds = xr.open_dataset(filepath, group="geophysical_data")
ds = ds.swap_dims({'number_of_lines': 'latitude', 'pixels_per_line': 'longitude'})
wvl = xr.open_dataset(filepath, group="sensor_band_parameters")
loc = xr.open_dataset(filepath, group="navigation_data")
lat = loc.latitude
lon = loc.longitude 
wavelengths = wvl.wavelength_3d
Rrs = ds.Rrs 

dataset = xr.Dataset(
    {
        "Rrs": (("latitude", "longitude", "wavelengths"), Rrs.data)
    },
    coords={
        "latitude": (("latitude", "longitude"), lat.data),
        "longitude": (("latitude", "longitude"), lon.data),
        "wavelengths": ("wavelengths", wavelengths.data)
    }
)
dataset

Note the two dimensions (latitude, longitude) in the latitude and longitude coordinates image

EMIT

Code snippet

import hypercoast
# download from https://github.com/opengeos/datasets/releases/download/netcdf/EMIT_L2A_RFL_001_20240404T161230_2409511_009.nc
filepath = "data/EMIT_L2A_RFL_001_20240404T161230_2409511_009.nc"
dataset = hypercoast.read_emit(filepath)
dataset

image

EMIT demo

betolink commented 6 months ago

I can take a look at what's going on but, @itcarroll is the subject matter expert, any ideas?

ebolch commented 6 months ago

This looks really cool! I'll take a look as well.

itcarroll commented 6 months ago

Hey @ebolch, representing the OB.DAAC here. The PACE/OCI L2 netcdf files have a "navigation_data" group akin to the "location" group. There would be some other variable name swapping, but how general is your emit_xarray(... ortho=True) tool? Might it work on OCI? That would be quite nice! Our processing tools are all built into seadas and are not Python importable.

ebolch commented 6 months ago

@itcarroll I don't believe the emit_xarray(... ortho=True) will work for OCI. The EMIT data has a geometry lookup table that can be leveraged to quickly place the data onto a grid using indexing, which that function uses. Its not actually conducting a projection and resampling process.

@giswqs I think to convert from swath to gridded data for the PACE OCI you could use pyresample and kdtree libraries. LP.DAAC has an example swath2grid python script to do this for ECOSTRESS data, maybe that will help?

giswqs commented 5 months ago

I was able to create gridded data from PACE swath data using the scipy_interpolate module. See the grid_pace() function in the hypercoast package. Here is a notebook example for visualzing PACE gridded data and spectral signatures interactively with a few lines of code.

import hypercoast
url = "https://github.com/opengeos/datasets/releases/download/netcdf/PACE_OCI.20240423T184658.L2.OC_AOP.V1_0_0.NRT.nc"
filepath = "PACE_OCI.20240423T184658.L2.OC_AOP.V1_0_0.NRT.nc"
hypercoast.download_file(url)
dataset = hypercoast.read_pace(filepath)
m = hypercoast.Map()
m.add_basemap("Hybrid")
wavelengths = [450]
m.add_pace(dataset, wavelengths, colormap="jet", vmin=0, vmax=0.02, layer_name="PACE")
m.add_colormap(cmap="jet", vmin=0, vmax=0.02, label="Reflectance")
m.add("spectral")
m

betolink commented 5 months ago

great work @giswqs !!