Create clustering sub module

stevenlujpl commented 3 years ago

@urebbapr I added kmeans clustering as a results organization method. The kmeans clustering method is registered using the name kmeans, and it can be invoked in the config file. Please see the following example. Currently, it only accepts n_clusters as a parameter, but we can add more if needed. I am using Sklearn's kmeans clustering, so any parameter that is available to Sklearn's kmeans clustering can be made available here.

# Example configuration
# Results organization module
results: {
    kmeans: {
        n_clusters: 3
    }
}

I am not sure how you want to use the clustering results, so for now I am simply saving the results in a csv file. Please see below for an example. The first column is the row id; the second column is selection id; the third column is the data id; the fourth column is the cluster group.

0, 10, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-132.png, 0
1, 8, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-40.png, 2
2, 6, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-1.png, 2
3, 4, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-146.png, 1
4, 3, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-32.png, 1
5, 5, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-25.png, 1
6, 11, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-125.png, 1
7, 9, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-4.png, 1
8, 0, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-222.png, 1
9, 1, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-197.png, 2

stevenlujpl commented 3 years ago

@urebbapr @emhuff I've added SOM as a result organization method. It can be used by adding som option in the config file. For example:

# Results organization module
results: {
    som: {
        n_clusters: 5
    }
}

The output is similar to kmeans clustering, which the last column is the cluster labels. Please see the example below:

0, 11, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-125.png, 1
1, 14, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-172.png, 0
2, 4, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-146.png, 2
3, 5, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-25.png, 0
4, 9, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-4.png, 0
5, 10, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-132.png, 2
6, 7, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-68.png, 2
7, 13, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-52.png, 0
8, 6, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-1.png, 0
9, 3, SOL01731NLB_551166039EDR_F0640678CCAM15903M1-32.png, 1

wkiri commented 3 years ago

@stevenlujpl Can you please install sklearn_som in the dora venv on our machines? I am getting:

Traceback (most recent call last):
  File "dora_exp_pipeline/dora_exp.py", line 17, in <module>
    from dora_exp_pipeline.outlier_detection import register_od_alg
  File "/home/wkiri/Research/DORA/git/dora_exp_pipeline/outlier_detection.py", line 12, in <module>
    from dora_exp_pipeline.dora_results_organization import get_res_org_method
  File "/home/wkiri/Research/DORA/git/dora_exp_pipeline/dora_results_organization.py", line 13, in <module>
    from sklearn_som.som import SOM
ModuleNotFoundError: No module named 'sklearn_som'

Thanks!

wkiri commented 3 years ago

@stevenlujpl It seems we also need to install tensorflow in the venv:

Traceback (most recent call last):
  File "dora_exp_pipeline/dora_exp.py", line 26, in <module>
    from dora_exp_pipeline.pae_outlier_detection import PAEOutlierDetection
  File "/home/wkiri/Research/DORA/git/dora_exp_pipeline/pae_outlier_detection.py", line 21, in <module>
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

stevenlujpl commented 3 years ago

Done. However, it seems tensorflow v2.5.1 isn't compatible with the CUDA drivers we currently have on analysis/paralysis machine. If you need to run PAE, please run it on CPU.

hannah-rae commented 3 years ago

@stevenlujpl Can we close this issue now?

nasaharvest / dora

Create clustering sub module #38