Closed stevenlujpl closed 3 years ago
@urebbapr @emhuff I've added SOM as a result organization method. It can be used by adding som
option in the config file. For example:
# Results organization module
results: {
som: {
n_clusters: 5
}
}
The output is similar to kmeans clustering, which the last column is the cluster labels. Please see the example below:
0, 11, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-125.png, 1
1, 14, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-172.png, 0
2, 4, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-146.png, 2
3, 5, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-25.png, 0
4, 9, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-4.png, 0
5, 10, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-132.png, 2
6, 7, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-68.png, 2
7, 13, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-52.png, 0
8, 6, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-1.png, 0
9, 3, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-32.png, 1
@stevenlujpl Can you please install sklearn_som
in the dora venv on our machines? I am getting:
Traceback (most recent call last):
File "dora_exp_pipeline/dora_exp.py", line 17, in <module>
from dora_exp_pipeline.outlier_detection import register_od_alg
File "/home/wkiri/Research/DORA/git/dora_exp_pipeline/outlier_detection.py", line 12, in <module>
from dora_exp_pipeline.dora_results_organization import get_res_org_method
File "/home/wkiri/Research/DORA/git/dora_exp_pipeline/dora_results_organization.py", line 13, in <module>
from sklearn_som.som import SOM
ModuleNotFoundError: No module named 'sklearn_som'
Thanks!
@stevenlujpl It seems we also need to install tensorflow
in the venv:
Traceback (most recent call last):
File "dora_exp_pipeline/dora_exp.py", line 26, in <module>
from dora_exp_pipeline.pae_outlier_detection import PAEOutlierDetection
File "/home/wkiri/Research/DORA/git/dora_exp_pipeline/pae_outlier_detection.py", line 21, in <module>
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'
Done. However, it seems tensorflow v2.5.1 isn't compatible with the CUDA drivers we currently have on analysis/paralysis machine. If you need to run PAE, please run it on CPU.
@stevenlujpl Can we close this issue now?
@urebbapr I added kmeans clustering as a results organization method. The kmeans clustering method is registered using the name
kmeans
, and it can be invoked in the config file. Please see the following example. Currently, it only acceptsn_clusters
as a parameter, but we can add more if needed. I am using Sklearn's kmeans clustering, so any parameter that is available to Sklearn's kmeans clustering can be made available here.I am not sure how you want to use the clustering results, so for now I am simply saving the results in a csv file. Please see below for an example. The first column is the row id; the second column is selection id; the third column is the data id; the fourth column is the cluster group.