nolanlab / citrus

Citrus Development Code
GNU General Public License v3.0
31 stars 20 forks source link

What does the scaleColumns argument do in citrus.full()? #81

Closed bc2zb closed 8 years ago

bc2zb commented 8 years ago

I tried looking at the code and cannot seem to find what the scaleColumns argument is doing. I ask because I ran my experiment with and without a list of columns to scale with and the results change.

rbruggner commented 8 years ago

The scale columns argument scales the distributions of those parameters to have a mean value of zero and a standard deviation of 1. This scaling happens after any transformation that you may have specified (e.g. the arcsinh transformation).

This feature was added to deal with circumstances where certain parameters/channels may have substantially different dynamic ranges and those parameters/channels with larger absolute values tend to overly-influence clustering. By scaling all parameters/channels to have the same mean and standard deviation, you reduce the chance that clustering would be overly-influenced by one channel. However, I've not done any systematic evaluation on the general effect of scaling channels and there could be some unanticipated / undesirable effects associated with scaling channels.