paigem commented 1 year ago

Create histograms to compare the values of the full resolution field and the fields computed from coarsened input (aka the first two columns in the results shown in #32).

This was based on discussions on Friday, Sep. 30th (notes here).

paigem commented 1 year ago

I have some first results for this!

Summary of what these plots show:

They compare the histogram of values of the specified heat flux.
full = the heat flux computed from full resolution fields before coarsened at the end (left-most column in the results shown in our first results issue)
large_scale = the heat flux computed from coarsened input fields (2nd column from the left in the results shown in our first results issue)
So far, I have only computed this for CM2.6 and the ecmwf method
These are all averaged over the full 20 years of data
The top plot in each figure shows both full and large scale fields.
The bottom plot in each figure shows the difference: full - large scale.

Sensible heat flux

Latent heat flux

Notes

The overall shapes of the histograms are very similar. As predicted, however, we see (especially clear in the difference between full and large scale - panel 2 in each figure) that the full fields have a longer tail (toward the negative values). This indicates that the large scale fields are underestimating the heat fluxes at the larger magnitudes.

jbusecke commented 1 year ago

Nice! Visually the first (total histogram) is very difficult to distinguish. I wonder if it would be better to scale the y-axis with a log scale? But that might be very unintuitive. Two tweak suggestions that would make it easier to read these values:

Insert a vertical line at the 0 value along the x axis (this is important to get the mean values right
Plot the histogram as a transparent patch (e.g. fill between the x axis and the values of the histogram). This might help with showing the small differences better than the bars which distract the eye a bit IMO.

paigem commented 1 year ago

I have now computed histograms for each algorithm (that we have - 4 at the moment) for both qh and ql. I took @jbusecke's suggestions of showing the histograms as filled curves with transparent color and added a vertical line at zero. The log plot did not show the differences any better, so I am still using a linear y-axis here.

These plots, as above, show the full field (in blue) and the large scale field (in red) on the same axes. To the right is shown the difference between the full and large scale histograms for each algorithm (in grey). As you can see, the algorithms can be very different!

Some preliminary observations:

I expected to see that the large-scale field has a shorter tail in the histogram, indicating fewer of the values at the extreme ends of the magnitude. In ql, the ncar algorithm does indeed show this behavior. However, the andreas algorithm shows the exact opposite: the large-scale field has larger (negative) values than the full field! I don't have a good reason for why this would be the case...
ql appears to show a bimodal distribution, compared to qh with a single "hump".

Latent heat

Sensible heat

Code

```python import xarray as xr import numpy as np import matplotlib.pyplot as plt import json import gcsfs from dask.diagnostics import ProgressBar # 👇 replace with your key with open('/home/jovyan/scale-aware-air-sea/pangeo-forge-ocean-transport-4967-347e2048c5a1.json') as token_file: token = json.load(token_file) fs = gcsfs.GCSFileSystem(token=token) subfolder_full = 'ocean-transport-group/scale-aware-air-sea/outputs/temp/' subfolder_final = 'ocean-transport-group/scale-aware-air-sea/outputs/' # Load results def load_cm26_store(path): mapper = fs.get_mapper(path) ds = xr.open_dataset(mapper, engine='zarr', consolidated=True, use_cftime=True, chunks='auto') # for now discard the polar regions ds = ds.sel(yt_ocean=slice(-60, 60)) return ds algos = ['ecmwf', 'ncar', 'coare3p6_test', 'andreas_test'] datasets = [load_cm26_store(f'{subfolder_final}CM26_final_output_full_time_{algo}.zarr').assign_coords({'algo':algo}) for algo in algos] ds_plot = xr.concat(datasets, dim='algo') ds_plot # Plot ql var = 'ql' bins = np.linspace(-400,80,50) algo = 'ecmwf' ds_algo = ds_plot.sel(algo=algo) #.isel(time=0) full = ds_algo[var] large_scale = ds_algo[var+'_large_scale'] h_full = histogram(full.rename('full'),bins=bins,dim=['xt_ocean','yt_ocean']) h_large_scale = histogram(large_scale.rename('large_scale'),bins=bins,dim=['xt_ocean','yt_ocean']) %time h_full_loaded = h_full.mean('time').load() %time h_large_scale_loaded = h_large_scale.mean('time').load() ## Repeat above for all 3 algorithms # Try plotting with a function for each subplot fig = plt.figure(figsize=(16,7)) def plot_hist_all_algos(subplot_num,var,algo,ds1,ds2): plt.subplot(subplot_num) ds1.to_series().plot(alpha=0.4,label='full') ax1 = ds2.to_series().plot(alpha=0.2,color='r',label='large_scale') plt.title(algo,fontsize=14) plt.fill_between(ds1.full_bin, 0, ds1,color='b',alpha=0.4) #, color='orange') plt.fill_between(ds1.full_bin, 0, ds2,color='r',alpha=0.2) #, color='orange') plt.legend() plt.axvline(0,color='grey') plt.xlabel('') plt.subplot(subplot_num+1) ax = (ds1.rename({'full_bin':'bin'})-ds2.rename({'large_scale_bin':'bin'})).to_series().plot(label='full - large_scale') plt.title(f'{algo} difference',fontsize=16) plt.legend() plt.xlabel('') plt.axvline(0,color='grey') plt.fill_between(ds1.full_bin, 0, (ds1.rename({'full_bin':'bin'})-ds2.rename({'large_scale_bin':'bin'})),color='grey',alpha=0.8) #, color='orange') plot_hist_all_algos(241,var,'ecmwf',h_full_loaded,h_large_scale_loaded) plot_hist_all_algos(243,var,'ncar',h_full_ncar,h_large_scale_ncar) plot_hist_all_algos(245,var,'coare3p6',h_full_coare3p6,h_large_scale_coare3p6) plot_hist_all_algos(247,var,'andreas',h_full_andreas,h_large_scale_andreas) fig.subplots_adjust(hspace=0.4) ```

jbusecke commented 1 year ago

These are very nice! Thanks @paigem. I am still struggling to interpret these results, but I agree that andreas is significantly different from the others. Do we have a reference for where this particular algo is used? E.g. in which model/study?

ocean-transport / scale-aware-air-sea

Create histograms of the full resolution field and the fields computed from coarsened input #36

Sensible heat flux

Latent heat flux

Notes

Latent heat

Sensible heat