Open dlee138 opened 9 years ago
Well I suppose we went over these already somewhat but here's an attempt at answering for the 2nd part:
Loss is usually defined as a cost (defined by some parameters) of taking different actions, in the case of k-means this is the cost (mean squared error) of assigning some sample x to cluster j. This loss is parameterized by the number of clusters and the cluster assignments
Risk is the expected loss given a distribution so this would be the average squared error over all possible samples/assignments. Since our K-means depends on how it is initialized, we would also have to find the expected value over different initial clusters in the case of random selection.
Loss is from the perspective of optimization the objective function. It is necessary to have a loss function, or else we are just wandering aimlessly in math space. But loss is also a random variable. For different samples, we will see a distribution of losses, so it is necessary to use risk to give us an expected value of the random variable.
Based on the syllabus, it looks like we were supposed to cover loss and risk functionals last class but we didn't have a chance to get to it. Can anyone give a brief summary of what they are and how they are applied to the k means algorithm?