Closed aiyanzidexiaotao closed 4 years ago
Standartization formula assumes that you are subtracting mean and dividing on standard deviation. But if you subtract mean from the sparse vectors they would be converted to the dense vectors, which might requires reasonably more memory.
In chapter 6, you wrote"Note that data is sparse, so it is reasonable to not substract mean for avoiding violating sparsity." But, the Standartization code is "def standardizer(column): return ((col(column) - avg_dict[column])/std_dict[column]).alias(column)" Should I subtract the mean or not?