pmlmodelling / nctoolkit

A Python package for netCDF analysis and post-processing
https://nctoolkit.readthedocs.io/en/latest/
GNU General Public License v3.0
79 stars 11 forks source link

[JOSS] Compare nctoolkit with existing tools and find a better example #68

Closed malmans2 closed 1 year ago

malmans2 commented 1 year ago

Hi there,

Nice package! I didn't have the time to review it yet, but I glanced through the documentation and the paper.

My first comment is actually a question that I think you should address both in the paper and in the documentation: How does this tool compare to other existing packages?

I don't find the example in the paper particularly convincing, so my suggestion is to find a better one that really highlights the strengths of nctoolkit.

xarray is my go-to library for this kind of processing, and the example in the paper looks quite similar (although the plot is not publication ready, so that's definitely a strength of nctoolkit):

import fsspec
import cartopy.crs as ccrs
import xarray as xr

url = "https://downloads.psl.noaa.gov/Datasets/COBE/sst.mon.mean.nc"
with fsspec.open(f"simplecache::{url}", simplecache={"same_names": True}) as f:
    ds = xr.open_mfdataset(f.name)

ds_clim = ds.sel(time=slice("1900", "1919")).mean("time")
ds_pres = ds.sel(time=slice("2000", "2019")).mean("time")
ds_anom = ds_pres - ds_clim

ds_anom["sst"].plot(
    transform=ccrs.PlateCarree(), subplot_kws={"projection": ccrs.Robinson()}
)

image

https://github.com/openjournals/joss-reviews/issues/5494

malmans2 commented 1 year ago

Another comment about the example shown in the paper, looks like you already have a method to read/download from an url. So I'd suggest to use that in the example rather than using a local path to a pre-downloaded file.

platipodium commented 1 year ago

related to #58

robertjwilson commented 1 year ago

I've created a new example, which shows how to calculate changes in surface temperature from global climate models. This is something that you can just do in nctoolkit, whereas with other tools like xarray or iris, you normally have to google it. The example also uses the annual_anomaly method to show that it's easy to calculate changes against baselines.

https://github.com/pmlmodelling/nctoolkit/actions/runs/5654799069

Word limit of 1000 words is tight, so I've added a short list of netCDF packages in the ecosystem.

"The nctoolkit package sits within a Python ecosystem of packages such as xarray and iris, which provide data models and analysis software for netCDF, netCDF4 which provides low level access to netCDF data, and specialist software such as xesmf for processes such as regridding. In contrast to other netCDF libraries, the use of CDO as a back-end allows nctoolkit users to carry out operations, such as spatial averages, without having to specify the specific names of coordinates, such as longitude, latitude and time, which enables code written for one dataset to be easily applied to another."

The key point is that nctoolkit is somewhat more format-agnostic than xarray or iris. Illustrated with the example of spatial_mean. In both xarray and iris this requires a few lines of code.

The same is true for temporal averaging in nctoolkit using tmean for all types, whereas in xarray it's a slightly confusing combination of mean, groupby and resample to do things. But I'll just stick with spatial average in the paper.

malmans2 commented 1 year ago

Looks good and I think it makes much more clear the scope of nctoolkit. Could you please add instructions/code to download the example data?

robertjwilson commented 1 year ago

Thanks @malmans2

I've just updated the paper with a zenodo link. Technically you can download the data from ESGF, but that's a pain for most readers, so I've uploaded to zenodo, which will make things easier.

https://zenodo.org/record/8182678

malmans2 commented 1 year ago

Thanks!