nextGEMS / catalog

Intake catalog for nextgems
5 stars 7 forks source link

Prune icon ngc4 #56

Closed florianziemen closed 6 months ago

florianziemen commented 6 months ago

removed level 10 from ngc400[567] reformatted C5 data (btw. this is on /scratch/) - I think we should have only one simulation per dataset (also was the case before). Otherwise we create a bit too much diversity in the catalog structure.

lkluft commented 6 months ago

removed level 10 from ngc400[567]

I thought there is ocean data at zoom level 10. Personally, I think it's still fine to remove this zoom level from the dataset, but we should check with ocean people.

reformatted C5 data (btw. this is on /scratch/) - I think we should have only one simulation per dataset (also was the case before). Otherwise we create a bit too much diversity in the catalog structure.

I would argue that the three different experiments (CNTL, P4K, 4xCO2) can be seen as part of one dataset 🤷‍♂️ I think that this is a cleaner approach in case of ensembles than having separate entries. However, if we opt for individual datasets, we at least need to rename amip to amip_cntl.

The data is only preliminary (and hence on /scratch). I will copy a replacement for this dataset from LUMI in the next dats

florianziemen commented 6 months ago

removed level 10 from ngc400[567]

I thought there is ocean data at zoom level 10. Personally, I think it's still fine to remove this zoom level from the dataset, but we should check with ocean people.

I took it out of the catalog because to_dask() failed. The data already is deleted. Quoting Kalle: Also for the record - during diplomatic consultations between speakers of the parties of interest it transpired that removing the remaining HEALPix zoom 10 data as well as the zoom 9 data - the latter with the sole exception of precipitation flux, pr - for the three "pre-"prefinal experiments ngc400[567] would be an acceptable solution, so that is what was done now...

I would argue that the three different experiments (CNTL, P4K, 4xCO2) can be seen as part of one dataset 🤷‍♂️ I think that this is a cleaner approach in case of ensembles than having separate entries. However, if we opt for individual datasets, we at least need to rename amip to amip_cntl.

Hmm, I see your point. I'd still be more happy to have one entry per run - simply to keep the diversity of entries in the catalog as low as possible (the power and danger of the plain intake approach). One thing one could do, would be to play this game with ensemble members that should be interchangeable (e.g. the 100 4xCO2 runs of the grand ensemble). That would still kind of reflect the idea that whatever "variant" of the dataset you are looking at, you are still looking at the same data, and it would de-clutter the catalog quite a bit (compared to 100 quasi-identical entries).

The data is only preliminary (and hence on /scratch). I will copy a replacement for this dataset from LUMI in the next dats

koldunovn commented 6 months ago

So what do we do with this one?

florianziemen commented 6 months ago

If @lkluft is fine I'd merge, but I don't want to change his dataset without his consent (I can also undo the changes there).

lkluft commented 6 months ago

I still like the my approach with the ensemble ;) but if you think that it obfuscates the structure of the catalog I am fine with the changes 👍

florianziemen commented 6 months ago

It is a cool approach. It just adds another degree of complexity to the game.

I'd say we reconsider the whole structure when we move to intake2 (or do some other major change). Would be nice to get things a bit more uniform without going totally crazy on structure discussions.