Open fabiobdias opened 9 months ago
update: I got these statistics done (I wasn't ".load()" after doing the binning which was delaying the calculations - thanks Will!).
So, here we have done the statistics (r2 and linear regression slope) for each of the 5-deg bins and then averaged out for each ASC regime (using the time-mean mask). The high correlations in the upper-100m for the surface regime decreased substantially, but the high correlation at ~800m depth remains.
Interesting that there's not much difference between the regimes or timescales here. I wonder why it's different from Ellie's analysis where the annual average correlations seemed to improve going to regional analysis? Is it dependent on the longitude bin size we choose?
It would also be interesting to show the regional data on a map (or plotted against longitude on x-axis). Say, the r^2 value averaged between 800-1000m for each 5deg longitude bin, vs longitude?
I was also wondering about the similarity among regimes/timescales (thought I was doing smth wrong with the masks but it seems alright). I will try smaller and bigger bins to test its dependence.
Here is what the r^2 averaged below 760m for the 5deg longitude bins looks like:
Maybe larger longitudinal bins? The Totten region I selected is ~13deg longitude wide, and the rest are 30 deg. Esp as the CSHT was already binned by 3 degs by my understanding...?
Here is the attempt with 15-deg longitudinal bins:
1) first look at the binned current along-slope:
2) here's the regional r^2:
3) finally all statistics per depth/regime/timescales:
Some correlations do get better, such as near-surface in both surface-intensified and deep regimes (monthly climatology) and at depth (reverse & deep regimes, all timescales).
and here is another attempt using 2-deg longitudinal bins:
I don't think we've done the analysis yet (correct me if I'm wrong) of assessing the correlations for the original resolution of longitudes using daily data, then averaging over the regimes. I tried something similar but it was for monthly data, and with time-varying, not time-mean regime masks. It would be good to see what the correlation looks like for the original longitude grid (i.e., without any coarsening), with daily data and time-mean regime masks, as it would provide a good baseline for comparison with these binned longitude diagnostics, but also with the correlations provided by Wilton here.
Thanks for the suggestion @taimoorsohail - I will try to use daily data and get these correlations in the original resolution. I also had a chat with @willaguiar today and I've applied the regimes masks after binning in longitudes (and then calculated the correlations); I will test masking it before binning to see if that isn't affecting the results.
I've re-calculated these statistics, now applying the regimes masks u-along/CSHT) before the longitudinal binning. The good news is that some of the high r^2 present in more than 1 regime at the same longitudinal bin (as some of the r^2 maps shown above) are avoided now. However, overall the results haven't changed that much, with a general increase of the r^2 as we use larger bins. Plots are shown below. I will repeat those using the daily data next...
for 2deg bins:
for 5-deg bins:
for 15-deg bins:
Figure comparing statistics over different longitudinal bin sizes - shows increase of the r^2 at depth with larger bins:
Interesting. What if you keep pushing to larger bins (20deg, 30deg, 40deg...)? Does it converge or get worse at some point?
yeah I was thinking that... should decrease at some point, right? I will try some more bin sizes...
Cool! At least for these bin widths, if we take r^2=0.5 as our “significance threshold” then it only adds a few more depth levels in the reverse and deep regimes for 15-deg bins (with the exception of Deep/Monthly Climatology which does seem to include a lot more depth levels). Would be cool to see what happens in the extremes as Adele mentioned.
On 14 Mar 2024, at 4:41 pm, Fabio Boeira Dias @.***> wrote:
yeah I was thinking that... should decrease at some point, right? I will try some more bin sizes...
— Reply to this email directly, view it on GitHub https://github.com/willaguiar/ASC_and_heat_transport/issues/29#issuecomment-1996569703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE54VPCL7IIEZT6ZT6BLN5TYYE2CDAVCNFSM6AAAAABC42HUZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGU3DSNZQGM. You are receiving this because you were mentioned.
Below are the circumpolar maps for 800-1400m depth-averaged correlations, variance of ASC and variance of CSHT using data binned longitudinally (thanks Fabio!). I calculate the variance at each depth level before taking the depth-average. The binned quantities are interpolated to cut off at the edge of each longitudinal bin so that I could plot them nicely. The colors of the outer ring show the time mean regime mask using GMM. I don't know if there a pattern in the correlation w.r.t the regimes, but the variance line up with the local processes at least.
5 deg bins
10 deg bins
15 deg bins
I've tried a couple more longitudinal bin sizes (5, 10, 15, 20, 30, 40 degrees). Different bins are represent with slightly different red (r^2) and green (regression slope) tones:
The relatively high r^2 below 700m does converge towards larger longitudinal bins at all timescales for the surface and reverse regimes. But it doesn't seem to be converging for the deep regime at seasonal time (monthly clim).
These results aren't yet weighted by the number of points in each regime (as we discussed last meeting), which I'm doing now and should post another update soon.
I was just looking into the weighted-average but I realised the longitude bins has uniform numbers of points in each bin. Is that expected? My guess would be yes for a uniform longitude spacing (e.g., the original grid), but as we get points along contour I wouldn't expect each bin to have the same number of grid-points...
it seems like yes, the longitude values along the contour are indeed uniformly spaced:
CSHT_months.lon
xarray.DataArray 'lon' lon: 1428
array([-278.5 , -278.25, -278. , ..., 77.75, 78. , 78.25])
which in this case wouldn't change the results above if we weight-average the statistics.
Yes the heat transport is binned into evenly spaced longitude values when including the zonal convergence term.
But within each bin, not all of the longitude points will be the same regime. E.g. if there’s 20 longitude values in a 5deg bin, then maybe only 2 of those longitudes correspond to the warm regime, so we wouldn’t want to weight this bin the same as another bin that has 20 warm point regime longitude values in it.
ahh got it! thanks for the clarification @adele-morrison
Ok results are quite sensitive if we weight-averaged the statistics. In the reverse and deep regimes, increasing the bin size causes r^2's to decrease (so smaller bins, i.e. 5deg, has larger correlations). The surface regime seems to be less sensitive to the weighting. R^2 in general are smaller than before, although max correlation at depth still present. Maximum correlation (r^2 ~ 0.7) is found at reverse regime at depths > 800m at seasonal timescale:
With the new bin weighting, the correlations seem MUCH lower than previously (roughly 0.2-0.4 less!). If we increase the bin size to 360deg, then we should expect these to converge on our old larger correlations before we started binning right? Does that mean there's a bug somewhere, or should we keep increasing bin sizes to see when the correlation increases again?
Just posting here the fixed the weight-average calculation, which now applying the weights correctly, and then sum up for each regime and divide by the sum of the weights. Correlations now are similar to what we had before (without the weighting ):
During the hackathon today we agreed to shows a version of the Figure 2 with different bin sizes in the suppl. material, and use 20deg bins for the main figures.
Here is an updated version, with the daily deseasoned data on the bottom, in the 20 deg bin config. PDFs on the way....
Curious that the seasonal correlations for the surface regime are so weak now, because they were ~0.8 before. I guess that's due to the change from zonal avg to 20deg bins?
Out of curiosity, what does the daily look like with the new binning without the seasonality removed?
Curious that the seasonal correlations for the surface regime are so weak now, because they were ~0.8 before. I guess that's due to the change from zonal avg to 20deg bins?
yeah, it looks like by comparing with this fig. I wonder why the binning changes so much the surface result ( cause even 5deg binning decreases the correlation compared to the original one).
Out of curiosity, what does the daily look like with the new binning without the seasonality removed?
Below is the daily 5 to 40 deg binning with (top) and without seasonality(bottom):
For me it looks like the deseasoning is doing very little for the correlations ( can only see small changes in the upper 200m of the surface, and upper 400m of the deep)
Additional info: for deseasoning I calculated the monthly climatology, interpolated it to daily, and then removed it from the daily data.
Below are the plots for the deseasoned CSHT and U ( vertical sum and mean).
I'm trying to calculate the statistics using 5deg longitude bins rather than averaged over the whole regime as we've been doing. I've included below a plot of the monthly along-slope current for both the original and binned into 5-deg longitude.
The problem occurs when trying to calculate the linear regression statistics. There are 2 loops (borrowed from the original GMM_ASC.ipynb), which were originally looping the scipy.stats.linregress function through (a) ASC regimes (n=3) and (b) depth levels (n=50).
For the binned CSHT/ASC, it means going through the whole binned contour (n=71). Using XXLargeMem, every stats.linregress takes ~14seconds to complete, which results in 14sec x 50 x 71 x 9 (doing it 9 times for _all, _annual, and _clima variables) = ~124 hours.
Tried to split these into 3 pbs scripts, and using hugemem, and it still ran out of walltime (max 48h).
I think one solution might be to vectorise the CHST/ASC inputs to the linregress, but when I tried I realised the CSHT has 75 vertical levels and ASC along-slope has only 50. Can anyone recall me why is that?