Closed DaniJonesOcean closed 3 years ago
One constraint: in order to use GMM, there needs to be a value for T&S at every selected density level. So we couldn't apply this method near outcrops. Similarly, we would probably also want to exclude the mixed layer, which can feature seasonal outcrops. That's probably fine.
I think it'll be fine to exclude the ML. Looking forward to seeing what this method shows!
I've made some progress on this, in that I've coded up a notebook that is able to (1) calculate the densities (sig0
) using TEOS-10 gsw
and (2) linearly interpolate the T
and S
values onto a set of target density values.
Unfortunately, finding a good set of density levels that are suitable for interpolation is proving to be tricky. I can get an idea for the appropriate range by looking at the histogram, which seems okay. I then discard the sigma0 levels that feature nothing but NaN
values, and I then discard the profiles which feature NaN
values anywhere throughout the column. So far, this procedure has left me with a very small number of profiles! So, I will need to keep experimenting with the appropriate density target levels.
Below is an example of the potential temperature values interpolated onto sigma surfaces:
Alternatively, we might consider targeting a specific water mass and looking for structures within that water mass. I think that would be exciting.
Right now I'm trying $\sigma_0$. We could try neutral density instead.
This is really cool! I think targeting watermasses would be very interesting.
I like how you effectively turned it around, trying to use clustering to find the density levels.
I'm likely being slow, but why do we need to discard anything with a nan?
I haven't done any clustering yet. I'm still experimenting with the right density ranges and numbers of bins to use. :)
As far as I understand, we have to discard NaN values before clustering. Unless there's a fancy new method that I'm not aware of. 🤔
I am also not aware of methods that can deal with NaNs.. But if the NaNs are somewhere in the profile, I interpolate, as long as I have some values above and below the NaNs.. (gosh, does it make sense?! :-/ )
What I do with land (NaN equivalent?) is to remove them in the original data, but 'remember' where they were. Do the clustering/exploration, and then put them back afterwards as the original 'gaps'.
Would this work here? I'm probably way off here, not understanding something/being slow!
Here are the salinity profiles for the Weddell Sea, in the range 50-300m. It would be impossible to cluster here, since there is basically no range in density shared across all the profiles. I'm just putting it here for reference, to visualise one of the limitations. I'll try a deeper depth range and will hopefully have more luck finding a shared density range.
Okay! The code is now able to project the profiles onto density surfaces and use the values of T&S on those density surfaces as the independent variables (dimensions) for the clustering analysis. This is neat, but there is one big challenge:
It is difficult (impossible?) to find a range of density surfaces on which we have values throughout the entire domain. This is perhaps unsurprising, as we know that isopycnals outcrop in the SO. As a consequence of this limitation, though, we can only classify certain density ranges. I'll post a few below for $27.0-27.2\sigma_0$:
If we target instead $27.5-27.75\sigma_0$:
I'm having trouble getting anything much further south to show up. Between the depth range limitations and the density range limitations, it's challenging to find much. That being said, the above plots are still cool and interesting. I'll keep experimenting with different ranges.
The Weddell Sea density plot suggests that we should have some luck between 100-1000m and $27.5-28.0\sigma_0$. It's worth a try...
This is fun, and it kinda works, but it's very difficult to target specific water masses and regions.
We do have the capability built into the code now, so I can consider this issue closed. It might be a useful feature for some applications.
Think about how this might work and what the advantages/disadvantages might be.