so-wise / weddell_gyre_clusters

Unsupervised classification of Weddell Gyre profiles
MIT License
2 stars 1 forks source link

Try 50-300m cut #28

Closed DaniJonesOcean closed 3 years ago

DaniJonesOcean commented 3 years ago

This should get us closer to the shelf.

Also try using the K=10 whole dataset as a start, and then using more clusters to further split the near-Antarctic class

DaniJonesOcean commented 3 years ago

image

Here is the 50-300m cut, using 10 classes. As expected, it gets further up onto the shelf!

DaniJonesOcean commented 3 years ago

There are two near-Antarctic classes:

image

image

It's almost like there's a "gyre" and a "near-coastal" class. Very cool. I'm tempted to try a K=5 model and further classify the near-Antarctic class, as we did before.

DaniJonesOcean commented 3 years ago

image

image

The above two correspond to K=4 (gyre) and K=10 (near-Antarctic). The T and S profiles are shown above.

isazar commented 3 years ago

Brilliant!! I like it a lot!!! An additional idea: go to 10 m (if there are enough profiles), and either use more clusters or classify only in the black region in figure. I'm curious to see if there are any more regimes inside the gyre, in the cyan cluster.

clusters_weddell

DaniJonesOcean commented 3 years ago

Thanks. :) And good idea! Here's what we get with K=5 and a zoomed-in domain:

image

isazar commented 3 years ago

That's coll! :) Lots going on in the gyre, so distinct from the ACC waters 1) Could the blue cluster represent more winter time profiles? 2) It's interesting that the pink dots are found in the southern edge too 3) The red cluster seems really distinct from the others.

Do you have the T/S diagram color-coded by cluster?

maikejulie commented 3 years ago

Hi, cool figures! I'm a bit behind.. Could you catch me up? How is the clustering happening?

I think we should settle on one statistical model for the clustering algorithm, meaning a number of K.

DaniJonesOcean commented 3 years ago

@isazar - Not yet. I'll aim to do that soon.

Hi @maikejulie! I'm still just using PCA and GMM so far. We are exploring the effect of changing the depth range. By selecting the 50-300m range, we capture more profiles on and near continental shelves. This requires a new statistical model. Then @isazar made the suggestion that we restrict our attention to the region in the black box in her comment above, to see if we can identify structures in that domain alone. This also requires a new statistical model.

While we are still exploring different domains in lat-lon-depth, there's still lots of room for varying K. It depends a bit on what we're looking for.

At present, there are at least three statistical models at play:

  1. One to characterize the entire SO-WISE domain
  2. One to further characterize the "near Antarctic" class identified by model (1.)
  3. An exploratory Weddell Sea focused one based on @isazar's suggestion.

I haven't posted any full reports yet - I've just been putting up plots! I hope this comment helps a little.

DaniJonesOcean commented 3 years ago

One more thing: in general, I'm finding that increasing K leads to big increases in the uncertainty (i-metric). Although BIC and AIC might suggest that we use a larger number of classes in some of these models (e.g. 10), I'm getting better results, in terms of the i-metric, when I stick to smaller class sizes (e.g. 5). In some cases, when I use the BIC-and-AIC suggested values, I get several classes where most of the profiles have large uncertainties.

DaniJonesOcean commented 3 years ago

I'm closing this as well, because I want to refocus these efforts and maybe produce a mini-report instead of collections of figures scattered across various GitHub issues. :)