m2lines / data-gallery

https://m2lines.github.io/data-gallery/
Apache License 2.0
0 stars 2 forks source link

ToDo list #1

Open IamShubhamGupto opened 11 months ago

IamShubhamGupto commented 11 months ago
IamShubhamGupto commented 11 months ago

@LaureZanna @jbusecke We can discuss more about the repository, notebooks, tools, plots here

IamShubhamGupto commented 11 months ago

@suryadheeshjith

LaureZanna commented 10 months ago

also tagging @NoraLoose who has been thinking about the code/data for the website and @Pperezhogin who is now training ML models from CM2.6 data on the LEAP Hub so we would want to use that as one of our example advanced test cases.

jbusecke commented 9 months ago

Hey everyone. Thanks for getting this started. Could we link single issues to the todos above, so we can disuss things in a focused way for each item? Many thanks.

jbusecke commented 9 months ago

Just finished a bit more thorough review. Great job @IamShubhamGupto @suryadheeshjith! Please make sure to update the todo in the original post with new items as you see fit (I suggest including merged prs/ closed issues) so we have a nice way to see progress here too!

I think we should chart a bit further into the future what we want to achieve here and how that will influence the structure of the book.

The main next step IMO should be to figure out a high level structure. Currently the notebooks are a root level list based on tools. Is that the organization we aim for? Or do we want to have different chapters:

cc @LaureZanna

LaureZanna commented 9 months ago

thanks @jbusecke , I agree - we are still missing a high-level structure. Here is a possible suggestion:

Happy with something else!! cc @NoraLoose @adcroft

NoraLoose commented 9 months ago

I like the high-level structure that you are proposing @jbusecke @LaureZanna!

If one of the advanced use cases is to train an ML model from CM2.6 data, we may also want to add pytorch to the list of tools.

NoraLoose commented 9 months ago

I will ask an even more general question: What is the goal of the data-gallery?

Showcasing M2LInES work? Tutorials on how to do ML for climate science? A book to be submitted to JOSE?

Sorry, if I missed earlier discussions on the end goal.

LaureZanna commented 9 months ago

@NoraLoose : thanks for the feedback. The primary goals are

No plans for educational tools as with L96 yet, but this might change.

jbusecke commented 9 months ago

Sounds like we have some convergence here. I am suggesting to focus on dataset specific notebooks for now and link the other sections to those notebooks. The reason I am saying this is that I suspect that there is a strong correlation between the individual datasets and the methods we can/want to apply.

So we could start with something like:

I think this will most naturally enable us to ingest (maybe existing notebooks) from peoples research. Few researchers write a notebook that shows how to load all the different datasets into xarray, but everyone writes a notebook for their dataset which loads, visualizes, and processes the data. To parse/organize these different parts out is the mission for this project.

A resulting methods notebook could then look like this:

Xarray

Loading data

Basic description links and small example Here are some examples how to use this on specific datasets: [Xarray Loading with OM4]() [Xarray Loading with Another Dataset()]()

Basic Visualization with xarray

Timeseries

Basic description, links and small example Here are some examples that use xarray timeseries plotting on specific datasets: [Xarray Timeseries plot with OM4]() [Xarray Timeseries plot with Another Dataset()]()

This enables the reader to not be overwhelmed by scrolling through 4000 lines of examples, but if they are interested in how to specifically apply a certain step to some dataset they can easily do that.

Happy to chat about this today on slack if needed. Starting tomorrow I will be on winter break.

jbusecke commented 7 months ago

Just taking notes from our current conversation:

We decided to have two headers on the website based on this guide to write technical docs

Trying to define a roadmap:

  1. Collecting source notebooks
    • Complete the list of 'source' notebooks might take longer, but we will template the steps for the existing ones.
    • We decided to actually copy and not link the notebooks (but link to the original in the header).
      1. Parsing each notebook into the tools used (so they can be linked in the Tutorials).
    • Proposed deadline: Fri 16th (meeting to check in on Tue 13).
adcroft commented 7 months ago

If you want some ideas from other notebooks looking at MOM6 output (not OM4), https://mom6-analysiscookbook.readthedocs.io/en/latest/ might be useful

LaureZanna commented 7 months ago

Thanks @adcroft . @suryadheeshjith @IamShubhamGupto @jbusecke : this is great, we can get some inspiration from it and adapt some of them for OM4 + CM2.6 for our datasets, and create a few more diagnostics that are relevant for M2LInES