marine-predators-group / Aidans_Journal

Aidan's research notebook
https://marine-predators-group.github.io/aidans-nb/
MIT License
0 stars 0 forks source link

Cluster analysis #2

Open camrinbraun opened 4 years ago

camrinbraun commented 4 years ago

@AidanCox12 See below for what I have from Sal's matlab code via an email he shared in 2014 (!). I'm digging up some R stuff that I wrote on this now. If only I had a nice doc on github summarizing where I left off...

Here is what you need to do :
Compile your data into a single array with dimensions (# of days by # of bins of TAD)  e.g. (4000 x 12) - days in my case is 24hr binnned data for all sharks. Call it X.
Then compile another array (or individual vectors) with the same number of rows additional  e.g columns: ID , date, sex, lat, long, etc
the number of bins and length of your data is arbitrary

%%set low usage bins = 0; I found this really helped define the differences between clusters , this could be time th sharks spent moving from one depth bin to another but did not spend significant time there
X(X<0.1) = 0

%%cluster function parameters:
Y = pdist(X,'cityblock');
Z = linkage(Y,'average');

%%%Then choose number of clusters. - this part is exploratory and you should try lots of different numbers - read up on criteria for deciding how many to go with. I went with the fewest number with the greatest separation.

 [H, T, Perm] = dendrogram(Z,9); %% in this case 9 clusters
  XS = X(T,:);

%%% Plot
  set(gca, 'xtick', [])
  imagesc(flipud((X(Perm,:)')));
  shading flat

%%% then explore your data by linking you cluster number assignment to your variables of lat long date, sex etc…
camrinbraun commented 4 years ago

I think we should go ahead and start a new project repo in our MPG org at https://github.com/marine-predators-group/cluster or https://github.com/marine-predators-group/cox_vertical or etc etc. Then you would take the lead on populating that repo for the project. Let me know once it's up and running and I'll drop in the existing R code I have.

The idea would be to follow the logic explained in our (very rough) group resources repo.

camrinbraun commented 4 years ago

Ok, I put up all the cluster code I have here https://github.com/marine-predators-group/cluster-2/commit/8084c6ce37d95e08ab6f8c1651575757cd760fad

You'll find one of these files is an Rmd file showing an example. The data is in our MPG google drive.

This is a bad solution but, for now, the metadata associated with our tags lives here. I've invited you to the nip_drake repo that contains this metadata in case you need it. You'll see one of the files I commited to your cluster repo shows how I filtered the master metadata sheet to get to the subset of data that I sent you.