Updates plot.py to plot "node" rather than "inc_edge". The consequence is that it does not print the parent label.
Normalization wasn't working before due to the incredibly small magnitude numbers and it was the l2 norm. I swapped it to the l1 norm since we want to scale to unit length to account for variance in each dimension. Essentially, l1 > l2 and sklearn normalizers suffered from numerical instability, so I created a manual self._normalise function. I verified through manual tests that the dataframes are normalised correctly, even if we are doing latency-focused clustering.
Preprocessing was buggy since it modified the dataframe in-place. If we preprocess the df, we don't want to plot the processed data but the unnormalized data. This is why we need to copy it. It is trivial to recover the original data from the processed data since we don't shuffle the data after processing it, so the indexes are the same.
Mute GMM convergence warning.
In GMM, if data is too small (1 row only or < n_clusters), we set it to a default value of 0.
There was a bug where if a clustering fails in a transform (e.g., only 1 row of data for GMM), it doesn't update self.labels. As a result, the subsequent transforms actually have incorrect labels (the labels from the previous run). Fixed this by resetting self.labels = [] after each fitting.
This branch currently has: