pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.54k stars 1.06k forks source link

simple command line interface for xarray #2034

Closed aymeric-spiga closed 4 years ago

aymeric-spiga commented 6 years ago

All my deepest apologies in advance if this is out of scope or naive. This is my first issue in this project and I have been using xarray only for a couple of days.

By the way, xarray is exactly what I needed as a planetary scientist / climate modeler. It makes my own homemade DIY non-professional https://github.com/aymeric-spiga/planetoplot library looks pretty useless. I wonder why I did not come across xarray before! I came to discover xarray because I was looking for better solutions than my DIY stuff. Better late than never...

So xarray has almost everything I need and it is a pretty impressive piece of work. There is one thing that I need though (and that seemed useful when I got feedback on the DIY stuff from users in my team) is the possibility to have a CLI to quickly explore the contents of a netCDF file and to make quick plots to check stuff.

My questions are

  1. Did I miss something similar that is already existing in the xarray project?
  2. If not, would this feature be useful for other people?

To emulate what I basically need, I wrote a small piece of code I am now using for quick exploration. I create a repository here https://github.com/aymeric-spiga/xarray-cli

Any thoughts or suggestions or comments welcome.

Again, many thanks for the great work with xarray.

fischcheng commented 6 years ago

From my experience, there are plenty of existing tools can do the data explore/quick-view thing, including the basic ncview, ncdump, cdo or panoply. Some of them support OpenDAP. I'm not sure if Xarray needs to have such feature.

aymeric-spiga commented 6 years ago

Thanks @fischcheng for your kind answer. I agree, it is my experience too to use ncdump for quick exploration, ncview for quick plots, cdo/nco for concat / averaging / anomalies etc... For the latter use, I found out Xarray had powerful methods so that I can script everything in python without a pre-processing with e.g. ncrcat. So I was wondering for an all-python solution, thus replacement for ncview since Xarray also have great plotting capabilities with thin matplotlib wrapping. But all-python replacement could also mean useless redundancy, I agree!

JiaweiZhuang commented 6 years ago

thus replacement for ncview

GeoViews can make interactive plots of xarray data. There's an example.

An even more straightforward and customizable way is matplotlib + Jupyter Interact. It can easily replicate all ncview's functionalities.

shoyer commented 6 years ago

This looks like a cool project, but a command line utility for netCDF files is outside the scope of xarray so I think it's best maintained separately. Of course, we are very happy to link to it (and other such tools) in the docs.

rabernat commented 6 years ago

What I would like to see is a replacement for ncview. Despite being archaic in terms of its technology (e.g. it requires X11), it is still used daily by a huge number of scientists because it serves a very important need: quick visual examination of a netCDF dataset (also works on remote systems via X forwarding).

Imagine a modern implementation of ncview backed by xarray and holoviews/geoviews.

$ xrview my_files.*.nc
Found 3 data variables and 6 coordinates in dataset
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...

This would spawn a web server providing an interactive web-based GUI explorer for all variables in the dataset. You could use this locally or on a remote system.

I think such an app would catch on like wildfire. But I agree completely that it is out of scope of xarray and belongs as a standalone project.

JiaweiZhuang commented 6 years ago

This would spawn a web server providing an interactive web-based GUI explorer for all variables in the dataset. You could use this locally or on a remote system.

Seems like JupyterLab is a perfect fit for this purpose. See this geojson extension for example. Notice that you can view a *.geojson file in a standalone window (shown as a map) and do not have to use Jupyter notebooks at all.

It should be possible to view a NetCDF file directly in JupyterLab, with an extension built on top of xarray+GeoViews. @philippjfr should have more insights on this...

JiaweiZhuang commented 6 years ago

And this JupyterLab approach will be way better than ncview... Say, you can easily compare multiple NetCDF files by subdividing panels.

fischcheng commented 6 years ago

@rabernat I like your vision. Ncview is so simple yet powerful that I totally neglect how terrible it looks. Also the examples provided @JiaweiZhuang seem perfect for such task. All parts are there, just waiting for someone to put them together!

aymeric-spiga commented 6 years ago

Wow! Many thanks everyone. Tell me about constructive comments.

@rabernat xrview is exactly what I would have dreamed to ask for, without daring to ask. ncview is very easy for quick plots, but those plots are not publication-ready, so we end up losing time redoing stuff with Python -- thus an xarray-based tool would be killer indeed.

@JiaweiZhuang many thanks for your roadmap for a ncview replacement. I will have a look to the tools you propose when I have time. I overlooked GeoViews because I thought xarray + matplotlib + cartopy was enough for my purpose. If I come up with anything useful, I'll let everybody know. My skills in Python might not be up to the task, though, so solutions might be explored by other interested potential contributors.

@shoyer thank your for your kind message; I forgot to say initially that it was not my intent to see such a tool being included in xarray, but I am happy to see references to standalone CLI interfaces somewhere in xarray (maybe affiliated packages as mentioned here https://github.com/pydata/xarray/issues/1850). If my simple tool https://github.com/aymeric-spiga/xarray-cli can be of use to anyone, do feel free to link to it. This would be actually more than I could hope for, since this project is very basic and works mostly as a simple demonstrator. Contributions are welcome, although efforts may be more useful on the above-mentioned xrview idea

fmaussion commented 6 years ago

Note that a useful xarray-cli doesn't have to do visualization only. An xarray-based replacement for cdo (for example) could have some great advantages:

I'm nut sure how many people will use a cli instead of good-old-python scripts though....

aymeric-spiga commented 6 years ago

@fmaussion That's a really good point, actually I was about to code an option for automatic concat with xarray when you input several files in xarray-cli, but thought that was taking me too far for a simple example.

About you last point: I am not sure either, and that was the starting point of my questions. At least myself I like to use a python-based CLI interface for quick-but-nice-enough plots as a preliminary exploration that still allows for nearly publication-ready plots (which is not the case for ncview!). Then for more elaborate diagnostics etc etc I use python scripting. But I don't know if this translates to other people, although as @rabernat mentioned, the number of people using ncview might indicate CLI would be of interest.

It appears also that xarray is so handy that interactive use with ipython+xarray is a rather quick way to access netCDF files and have a quick exploration, although CLI would be quicker and would avoid writing the same code over and over.

jhamman commented 6 years ago

A few months back, after discussing the idea with @czender, I started working on a simple reimplementation of NCO using xarray and dask (called xnco). My efforts on were mostly to compare the workflows (xarray vs. shell commands) and performance improvements possible when using dask-distributed. My efforts on that project have completely stalled and I'm not sure I will return to it. If anyone is interested though, I'd be happy to put that code up on github and let someone else run with it.

philippjfr commented 6 years ago

I'm not familiar with ncview myself, but based on the screenshots it shouldn't be too difficult to implement something that gets close on top of xarray/dask/geoviews/JupyterLab and some widget framework (probably bokeh widgets or ipywidgets). From my (biased) perspective that seems like the most straightforward approach anyway.

The thing I'm not sure about is how likely users of ncview are to adopt JupyterLab. That would determine whether it would make more sense to write it as a standalone app and integrate it with JupyterLab or build it entirely within JupyterLab. In either case I'd be happy to give pointers and help out both on the HoloViews/GeoViews front and on the JupyterLab development, which I've recently familiarized myself with.

benbovy commented 6 years ago

I came across psyplot, which might be also relevant here. I haven't looked at it in-depth yet, but it seems it has already a comprehensive set of features and good documentation.

cc @Chilipp

Chilipp commented 6 years ago

Hey! Thanks @benbovy for tagging me! Yes, indeed psyplot provides an interface between matplotlib and xarray to provide a more powerful tool than ncview or panoply to integrate data analysis and visualization. It provides a command line interface, a GUI and an implented ipython console. I actually wanted to make a new release and a post on the xarray mailing list by tomorrow to introduce it a bit more widely to the community.

If you are on the EGU next week by accident, I will also present it as a PICO presentation on Monday morning, April 9th at 8:30

Psyplot: Interactive data analysis and visualization with Python https://meetingorganizer.copernicus.org/EGU2018/PICO/28031/235994

aymeric-spiga commented 6 years ago

@philippjfr I had a look to Jupyterlab and for me who is already used to notebooks etc... it looks like a terrific tool! Although building an extension requires some skills. But your question is fully relevant: users not especially familiar with Python might more easily turn to a "xrview" concept as a replacement to ncview if this is a standalone app than if this is within the JupyterLab framework.

@Chilipp I have to try your psyplot tool because it looks exactly like what would be useful to me, and it is based on xarray. Did you elaborate from the matplotlib/cartopy wrapping from xarray, or did you develop your own matplotlib wrapping from scratch, only using xarray for CF-like cube exploration?

Chilipp commented 6 years ago

Sure, I hope it can help! The psy-maps package, which is the psyplot plugin for georeferenced plots, is built on the cartopy package (sorry, that is not made clear in the docs currently). For interpreting coordinates following the CF-Conventions, however, I wrote my own decoder class (http://psyplot.readthedocs.io/en/latest/api/psyplot.data.html#psyplot.data.CFDecoder). But you should not have to care about the latter, since this is integrated into the framework.

After having installed the packages, e.g. via

conda install -c conda-forge psyplot psy-maps psyplot-gui

you can simply create a plot from the command line via

psyplot your-netcdf-file.nc -pm mapplot

and it will open the GUI to create a plot. Or you type

psyplot your-netcdf-file.nc -pm mapplot -n your-variable -o plots.pdf

to save it immediately to a PDF file named plots.pdf.

There are however a lot more possibilities and formatting options.

Within python, you can just create a georeferenced plot via

import psyplot.project as psy
psy.plot.mapplot('your-netcdf-file.nc', name='your-variable')

which will then create a new figure with an instance of cartopy's GeoAxes on it. For more options and configurations of your plots, I refer to the docs :wink:, where you also find an example gallery, but you can of course always open an issue in the psyplot repository and I am happy to help

benbovy commented 6 years ago

If you are on the EGU next week by accident, I will also present it as a PICO presentation on Monday morning, April 9th at 8:30

I actually found psyplot via the EGU schedule and I'll be there (I'll also present a PICO on xarray related stuff, a bit later on Monday afternoon)!

aymeric-spiga commented 6 years ago

Thanks @Chilipp ! psyplot is a more professional and flexible tool (with a good doc I shall say) of what I was trying to do back then in a more DIY homemade way with planetoplot, so I will have to give it a try and most probably abandon my homemade stuff. I will also explore GeoViews and HoloViews as suggested by @JiaweiZhuang and @philippjfr because it has very interesting features. At any rate, using either xarray or xarray-based tools is a must those days for any netCDF-related exploration.

Chilipp commented 6 years ago

Great! If you want to try it today already, I recommend to use the nightly build (see the installing docs) or to use the latest development packages from my personal conda channel

conda config --add channels conda-forge
conda install -c chilipp/label/dev psyplot psy-maps psyplot-gui

The latter however only works for linux.

The new releases should have passed all tests by tomorrow morning and they will then be available through the conda-forge channel and pip.

stale[bot] commented 4 years ago

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically