rprops / Phenoflow_package

R package offering functionality for the advanced analysis of microbial flow cytometry data
GNU General Public License v2.0
9 stars 5 forks source link

Rescaling for bandwidth calculations #44

Closed FMKerckhof closed 6 years ago

FMKerckhof commented 6 years ago

In the Phenotypic Diversity Analysis wiki at a given time we select maxval <- max(summary[,9]) here the column identifier is largely dependent upon the parameter at that given column (which is FL1-H for the BD accuri C6 but may be completely different for e.g. the BD FACSVerse). Why don't we: 1) use vector-based rescaling and rescale each parameter with it's largest value? Or adapt mytrans to mytrans <- function(x) x/max(x)? 2) make this more generic?

FMKerckhof commented 6 years ago

I see now that dividing by each parameters maximum would change the ratio's for each parameter per cell.

FMKerckhof commented 6 years ago

As now implemented in 8882482bd1f504d5d3394da0a68c1b5028ec1f68 it stands okay, maybe add a verbose=FALSE option?

prubbens commented 6 years ago

Although ratio's indeed will be changed, this shouldn't have an impact, and in fact, often it is preferred for clustering or supervised approaches including regularization, for which it is required that variables display behavior on comparable scales. Moreover, it tends to speed up convergence, even for methods which don't require standardization/normalization. Normalization can be implemented as: mytrans <- function(x) ( x- min(x) ) / ( max(x) - min(x) )