ioos / notebooks_demos

Notebook demonstrations and examples
https://ioos.github.io/notebooks_demos/
MIT License
19 stars 19 forks source link

Add ioos qc example #346

Closed ocefpaf closed 4 years ago

ocefpaf commented 4 years ago

Not ready for review yet! I'm writing the text for the blog and explaining it a little bit.

@jessicaaustin, as the author of this notebook, are you OK with me copying it here?

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

jessicaaustin commented 4 years ago

@ocefpaf yes I am fine with copying it here, so long as the user can easily get back to https://github.com/ioos/ioos_qc . Thanks!

ocefpaf commented 4 years ago

@ocefpaf yes I am fine with copying it here, so long as the user can easily get back to ioos/ioos_qc . Thanks!

Thanks! That is the idea. BTW, I just pushed some text to explain what is going on. Do you want to take a look? Also, can I mention your as author of the code?

ocefpaf commented 4 years ago

@mwengren this is ready for review. Not sure if I should ping more people here.

ocefpaf commented 4 years ago

@kbailey-noaa I usually avoid self-merged and request for someone else for a rwview/merge. Do you mind taking a quick look at this PR? The link below has a rendered version of the notebook.

https://nbviewer.jupyter.org/github/ocefpaf/notebooks_demos/blob/ioos_qc/notebooks/2020-02-14-QARTOD_ioos_qc_Water-Level-Example.ipynb

mwengren commented 4 years ago

@ocefpaf and @jessicaaustin Very nice, like the notebook! Clearly shows how the QARTOD tests work.

I had a few thoughts about the source data:

  1. If they are in ERDDAP, let's add a link to the dataset in ERDDAP somewhere in the description at the top

  2. When running interactively, it would be cool to have a step to download the data via erddapy, if only a subset for the time period that's plotted. If it's too much overhead to include that step here, can we clone this notebook and make another one that includes that step and maybe does some other variation of QC test/plotting? Or maybe a different sample dataset?

kbailey-noaa commented 4 years ago

@ocefpaf Cool!
So rather than reading flags that AOOS set, you're defining the QC thresholds (in [In 2] and then setting flags to the dataset based on those thresholds? I wasn't sure I followed, but note I'm a novice!

1) The Bokeh link works but there's a warning that the page goes to a previous version. Replace with https://docs.bokeh.org/en/latest/

2) The water level station link only leads to a google map lat/lon, and I hoped for a data source. I think this gets at @mwengren 's comment. I wasn't clear about where the data came from, though I assume it's from the AOOS ERDDAP. Do we assume they're downloaded from somewhere to your desktop, and you're importing the downloaded .csv? Would probably be best to point to the source.

Otherwise it looks good!

mwengren commented 4 years ago

@kbailey-noaa as part of the PR, there's a static .csv file added called water_level_example.csv. I agree it's less obvious than adding an extra step in the notebook to download the data. @ocefpaf can you test a dynamic download step as an addition? That would make it more convincing for those running it dynamically via Pangeo binder (assuming it's not too much overhead or latency).

jessicaaustin commented 4 years ago

Two things:

mwengren commented 4 years ago

I think two variations of this notebook might be worthwhile. This version as-is, with the bundled water_level_example.csv and second one that reads data directly from ERDDAP and maybe varies the QC analysis somehow.

Or, if you want to get fancy, @ocefpaf maybe we can make a version that lets the user select an ERDDAP dataset from a pre-determined list, and then select a time window to download data for. Maybe that's going too far though... obviously the results might not be that interesting.

ocefpaf commented 4 years ago

Two things:

  • FWIW, I think including a static file is simplest and least prone to breakage (what if the ERDDAP dataset changes or goes down?). It would help to include a note saying where this CSV came from though, for context.

We can have both. A "frozen" dataset and some code teaching how we got it in the same notebook.

  • You should include Kyle as an author of this library as well

Will do. Thanks!

I think two variations of this notebook might be worthwhile. This version as-is, with the bundled water_level_example.csv and second one that reads data directly from ERDDAP and maybe varies the QC analysis somehow.

People may get confused by that. I'd rather have a single on focused on the ioos_qc demo.

Or, if you want to get fancy, @ocefpaf maybe we can make a version that lets the user select an ERDDAP dataset from a pre-determined list, and then select a time window to download data for. Maybe that's going too far though... obviously the results might not be that interesting.

That is an interesting idea as a new notebook focused on the interactivity/exploration of datasets with a QA/QC "widget." I'll work on something along those line.

ocefpaf commented 4 years ago

So rather than reading flags that AOOS set, you're defining the QC thresholds (in [In 2] and then setting flags to the dataset based on those thresholds? I wasn't sure I followed, but note I'm a novice!

Yep that is correct. The goal is to show that the config file with the thresholds, etc must be specified by the user.

  1. The Bokeh link works but there's a warning that the page goes to a previous version. Replace with docs.bokeh.org/en/latest

Thanks! Fixed that in the last commit.

  1. The water level station link only leads to a google map lat/lon, and I hoped for a data source. I think this gets at @mwengren 's comment. I wasn't clear about where the data came from, though I assume it's from the AOOS ERDDAP. Do we assume they're downloaded from somewhere to your desktop, and you're importing the downloaded .csv? Would probably be best to point to the source.

Indeed. Working on it...

ocefpaf commented 4 years ago
  • FWIW, I think including a static file is simplest and least prone to breakage (what if the ERDDAP dataset changes or goes down?). It would help to include a note saying where this CSV came from though, for context.

While that is true this goal of the Data Demo Center is to show how to obtain data using the various IOOS data sources. So that kind of breakage is "OK" here. I do not recommend that for the ioos_qc docs upstream though.

  • You should include Kyle as an author of this library as well

You mean notebook, right? (I'm adding you and Kyle, is that OK @jessicaaustin?)

ocefpaf commented 4 years ago

@kbailey-noaa and @mwengren I made the changes requested and now we cover both a "frozen" CSV file for future reference but we also have the code to download it from the AOOS ERDDAP sever.

@jessicaaustin do you want to take a second look before we merge this one?

ocefpaf commented 4 years ago

The CIs failures should be fixed once holoviews 1.13.- is release. See https://github.com/holoviz/hvplot/issues/424