Statistics using longitude bins before averaging over distinct ASC regimes

fabiobdias commented 9 months ago

I'm trying to calculate the statistics using 5deg longitude bins rather than averaged over the whole regime as we've been doing. I've included below a plot of the monthly along-slope current for both the original and binned into 5-deg longitude.

Screenshot 2024-02-07 at 9 52 19 am

Screenshot 2024-02-07 at 9 52 30 am

The problem occurs when trying to calculate the linear regression statistics. There are 2 loops (borrowed from the original GMM_ASC.ipynb), which were originally looping the scipy.stats.linregress function through (a) ASC regimes (n=3) and (b) depth levels (n=50).

For the binned CSHT/ASC, it means going through the whole binned contour (n=71). Using XXLargeMem, every stats.linregress takes ~14seconds to complete, which results in 14sec x 50 x 71 x 9 (doing it 9 times for _all, _annual, and _clima variables) = ~124 hours.

Tried to split these into 3 pbs scripts, and using hugemem, and it still ran out of walltime (max 48h).

I think one solution might be to vectorise the CHST/ASC inputs to the linregress, but when I tried I realised the CSHT has 75 vertical levels and ASC along-slope has only 50. Can anyone recall me why is that?

fabiobdias commented 9 months ago

update: I got these statistics done (I wasn't ".load()" after doing the binning which was delaying the calculations - thanks Will!).

So, here we have done the statistics (r2 and linear regression slope) for each of the 5-deg bins and then averaged out for each ASC regime (using the time-mean mask). The high correlations in the upper-100m for the surface regime decreased substantially, but the high correlation at ~800m depth remains.

adele-morrison commented 9 months ago

Interesting that there's not much difference between the regimes or timescales here. I wonder why it's different from Ellie's analysis where the annual average correlations seemed to improve going to regional analysis? Is it dependent on the longitude bin size we choose?

It would also be interesting to show the regional data on a map (or plotted against longitude on x-axis). Say, the r^2 value averaged between 800-1000m for each 5deg longitude bin, vs longitude?

fabiobdias commented 9 months ago

I was also wondering about the similarity among regimes/timescales (thought I was doing smth wrong with the masks but it seems alright). I will try smaller and bigger bins to test its dependence.

Here is what the r^2 averaged below 760m for the 5deg longitude bins looks like:

ongqingyee commented 9 months ago

Maybe larger longitudinal bins? The Totten region I selected is ~13deg longitude wide, and the rest are 30 deg. Esp as the CSHT was already binned by 3 degs by my understanding...?

fabiobdias commented 9 months ago

Here is the attempt with 15-deg longitudinal bins:

1) first look at the binned current along-slope:

Screenshot 2024-02-12 at 10 56 16 am

2) here's the regional r^2: Screenshot 2024-02-12 at 10 57 38 am

3) finally all statistics per depth/regime/timescales: Screenshot 2024-02-12 at 10 58 22 am

Some correlations do get better, such as near-surface in both surface-intensified and deep regimes (monthly climatology) and at depth (reverse & deep regimes, all timescales).

fabiobdias commented 9 months ago

and here is another attempt using 2-deg longitudinal bins:

Screenshot 2024-02-12 at 1 07 49 pm

Screenshot 2024-02-12 at 1 08 10 pm

Screenshot 2024-02-12 at 1 08 54 pm

taimoorsohail commented 9 months ago

I don't think we've done the analysis yet (correct me if I'm wrong) of assessing the correlations for the original resolution of longitudes using daily data, then averaging over the regimes. I tried something similar but it was for monthly data, and with time-varying, not time-mean regime masks. It would be good to see what the correlation looks like for the original longitude grid (i.e., without any coarsening), with daily data and time-mean regime masks, as it would provide a good baseline for comparison with these binned longitude diagnostics, but also with the correlations provided by Wilton here.

fabiobdias commented 9 months ago

Thanks for the suggestion @taimoorsohail - I will try to use daily data and get these correlations in the original resolution. I also had a chat with @willaguiar today and I've applied the regimes masks after binning in longitudes (and then calculated the correlations); I will test masking it before binning to see if that isn't affecting the results.

fabiobdias commented 8 months ago

I've re-calculated these statistics, now applying the regimes masks u-along/CSHT) before the longitudinal binning. The good news is that some of the high r^2 present in more than 1 regime at the same longitudinal bin (as some of the r^2 maps shown above) are avoided now. However, overall the results haven't changed that much, with a general increase of the r^2 as we use larger bins. Plots are shown below. I will repeat those using the daily data next...

fabiobdias commented 8 months ago

for 2deg bins:

fabiobdias commented 8 months ago

for 5-deg bins:

fabiobdias commented 8 months ago

for 15-deg bins:

fabiobdias commented 8 months ago

Figure comparing statistics over different longitudinal bin sizes - shows increase of the r^2 at depth with larger bins:

layer_wise_CSHT_vs_U_corr_BinsComparison_v2

adele-morrison commented 8 months ago

Interesting. What if you keep pushing to larger bins (20deg, 30deg, 40deg...)? Does it converge or get worse at some point?

fabiobdias commented 8 months ago

yeah I was thinking that... should decrease at some point, right? I will try some more bin sizes...

taimoorsohail commented 8 months ago

Cool! At least for these bin widths, if we take r^2=0.5 as our “significance threshold” then it only adds a few more depth levels in the reverse and deep regimes for 15-deg bins (with the exception of Deep/Monthly Climatology which does seem to include a lot more depth levels). Would be cool to see what happens in the extremes as Adele mentioned.

On 14 Mar 2024, at 4:41 pm, Fabio Boeira Dias @.***> wrote:

yeah I was thinking that... should decrease at some point, right? I will try some more bin sizes...

— Reply to this email directly, view it on GitHub https://github.com/willaguiar/ASC_and_heat_transport/issues/29#issuecomment-1996569703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE54VPCL7IIEZT6ZT6BLN5TYYE2CDAVCNFSM6AAAAABC42HUZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGU3DSNZQGM. You are receiving this because you were mentioned.

ongqingyee commented 7 months ago

Below are the circumpolar maps for 800-1400m depth-averaged correlations, variance of ASC and variance of CSHT using data binned longitudinally (thanks Fabio!). I calculate the variance at each depth level before taking the depth-average. The binned quantities are interpolated to cut off at the edge of each longitudinal bin so that I could plot them nicely. The colors of the outer ring show the time mean regime mask using GMM. I don't know if there a pattern in the correlation w.r.t the regimes, but the variance line up with the local processes at least.

5 deg bins Pasted image 20240322132533

10 deg bins

15 deg bins Pasted image 20240322140713

fabiobdias commented 7 months ago

I've tried a couple more longitudinal bin sizes (5, 10, 15, 20, 30, 40 degrees). Different bins are represent with slightly different red (r^2) and green (regression slope) tones:

layer_wise_CSHT_vs_U_corr_BinsComparison_v2_new

The relatively high r^2 below 700m does converge towards larger longitudinal bins at all timescales for the surface and reverse regimes. But it doesn't seem to be converging for the deep regime at seasonal time (monthly clim).

These results aren't yet weighted by the number of points in each regime (as we discussed last meeting), which I'm doing now and should post another update soon.

fabiobdias commented 7 months ago

I was just looking into the weighted-average but I realised the longitude bins has uniform numbers of points in each bin. Is that expected? My guess would be yes for a uniform longitude spacing (e.g., the original grid), but as we get points along contour I wouldn't expect each bin to have the same number of grid-points...

fabiobdias commented 7 months ago

it seems like yes, the longitude values along the contour are indeed uniformly spaced:

CSHT_months.lon xarray.DataArray 'lon' lon: 1428 array([-278.5 , -278.25, -278. , ..., 77.75, 78. , 78.25])

which in this case wouldn't change the results above if we weight-average the statistics.

adele-morrison commented 7 months ago

Yes the heat transport is binned into evenly spaced longitude values when including the zonal convergence term.

But within each bin, not all of the longitude points will be the same regime. E.g. if there’s 20 longitude values in a 5deg bin, then maybe only 2 of those longitudes correspond to the warm regime, so we wouldn’t want to weight this bin the same as another bin that has 20 warm point regime longitude values in it.

fabiobdias commented 7 months ago

ahh got it! thanks for the clarification @adele-morrison

fabiobdias commented 7 months ago

Ok results are quite sensitive if we weight-averaged the statistics. In the reverse and deep regimes, increasing the bin size causes r^2's to decrease (so smaller bins, i.e. 5deg, has larger correlations). The surface regime seems to be less sensitive to the weighting. R^2 in general are smaller than before, although max correlation at depth still present. Maximum correlation (r^2 ~ 0.7) is found at reverse regime at depths > 800m at seasonal timescale:

layer_wise_CSHT_vs_U_corr_BinsComparison_v2_wavg

adele-morrison commented 7 months ago

With the new bin weighting, the correlations seem MUCH lower than previously (roughly 0.2-0.4 less!). If we increase the bin size to 360deg, then we should expect these to converge on our old larger correlations before we started binning right? Does that mean there's a bug somewhere, or should we keep increasing bin sizes to see when the correlation increases again?

fabiobdias commented 7 months ago

Just posting here the fixed the weight-average calculation, which now applying the weights correctly, and then sum up for each regime and divide by the sum of the weights. Correlations now are similar to what we had before (without the weighting ):

layer_wise_CSHT_vs_U_corr_BinsComparison_v2_wavg

During the hackathon today we agreed to shows a version of the Figure 2 with different bin sizes in the suppl. material, and use 20deg bins for the main figures.

willaguiar commented 6 months ago

Here is an updated version, with the daily deseasoned data on the bottom, in the 20 deg bin config. PDFs on the way....

adele-morrison commented 6 months ago

Curious that the seasonal correlations for the surface regime are so weak now, because they were ~0.8 before. I guess that's due to the change from zonal avg to 20deg bins?

adele-morrison commented 6 months ago

Out of curiosity, what does the daily look like with the new binning without the seasonality removed?

willaguiar commented 6 months ago

Curious that the seasonal correlations for the surface regime are so weak now, because they were ~0.8 before. I guess that's due to the change from zonal avg to 20deg bins?

yeah, it looks like by comparing with this fig. I wonder why the binning changes so much the surface result ( cause even 5deg binning decreases the correlation compared to the original one).

Out of curiosity, what does the daily look like with the new binning without the seasonality removed?

Below is the daily 5 to 40 deg binning with (top) and without seasonality(bottom):

For me it looks like the deseasoning is doing very little for the correlations ( can only see small changes in the upper 200m of the surface, and upper 400m of the deep)

Additional info: for deseasoning I calculated the monthly climatology, interpolated it to daily, and then removed it from the daily data.

Below are the plots for the deseasoned CSHT and U ( vertical sum and mean).

willaguiar / ASC_and_heat_transport

Statistics using longitude bins before averaging over distinct ASC regimes #29