VOC clustering - Githubissues

I've used this document to explore the VOCs data and figure out how to cluster them. Seems like the best approach is using their correlation coefficients and using hierarchical clustering to determine the groups.

The thing is... these clusters are different depending on the type of data used: rawdata vs log-transformed data. The idea behind the clusters is that we would get weighted averages and use those for each VOC cluster. How would this change depending on the data used and what would the reasoning be for choosing rawdata vs transformed data.

Steps:

[ ] Cluster the raw data
[ ] Create the new dataset with weighted averages depending on the clusters
[ ] Check the structure of the new variables (histograms?)

javirudolph / mossmat

VOC clustering #4