Closed smarbal closed 1 year ago
@smarbal Please test...
Hello, Since commit 95b94c4, the metrics aren't printed at the end of the training anymore.
dataset make upx-PE -p upx -f PE
model train upx-PE -a kmeans --ignore-labels
model train upx-PE -a kmeans --ignore-labels
00:00:03.692 [INFO] Selected algorithm: K-Means clustering
00:00:03.693 [INFO] Reference dataset: upx-PE(PE32,PE64)
00:00:03.694 [INFO] Computing features...
00:00:37.711 [INFO] Making pipeline...
00:00:37.714 [INFO] Training model...
00:00:37.714 [INFO] (step 1/2) Processing standardize (StandardScaler)
00:00:37.718 [INFO] (step 2/2) Processing kmeans
Name: upx-PE_pe32-pe64_100_kmeans_f109
00:00:38.224 [INFO] Parameters:
- n_clusters = 8
- n_init = 10
- max_iter = 300
- tol = 0.0001
- algorithm = lloyd
The clustering metrics are present in the algorithms configuration file and before the commit the command outputted this :
$ model train upx-PE -a kmeans --ignore-labels
00:00:03.502 [INFO] Selected algorithm: K-Means clustering
00:00:03.503 [INFO] Reference dataset: upx-PE(PE32,PE64)
00:00:03.505 [INFO] Computing features...
00:00:39.249 [INFO] Making pipeline...
00:00:39.252 [INFO] Training model...
00:00:39.252 [INFO] (step 1/2) Processing standardize (StandardScaler)
00:00:39.256 [INFO] (step 2/2) Processing kmeans
Name: upx-PE_pe32-pe64_100_kmeans_f109
───── ──────────────── ─────────────────────── ────────────────────
. Silhouette Score Calinski Harabasz Score Davies Bouldin Score
Train -0.216 8.474 10.921
───── ──────────────── ─────────────────────── ────────────────────
...
@smarbal You can see why by using model -v train ...
. I guess the predictions are all -1
meaning no label. I think there is something failing in the pipeline but did not figure out what yet.
@dhondta Verbosity didn't give any more information.
I tested changing parts of the code and by reverting _convert_output
to it's previous state, I managed to have the metrics.
My guess is that the problem comes from this condition : if all(x == LABELS_BACK_CONV[NOT_LABELLED] for x in yp)
.
@smarbal Please test.
Works as intended. Thank you.
Improvement suggestion
Because clustering algorithms can have metrics that use labels or not, it could be interesting to add a parameter to the
train
method that would allow to only keep the metrics that don't use labels.