qiyuangong / Clustering_based_K_Anon

cluster based generalization for k-anonymity
MIT License
28 stars 14 forks source link

ran Anonymizer.py file but no output #1

Open aryansoni1108 opened 5 years ago

aryansoni1108 commented 5 years ago

I just ran the Anonymizer.py file but it seems to get stuck in processing i think. Iam pretty new to these type of projects so please help me. Adult data ['c:/Users/Aryan Soni/Downloads/Clustering_based_K_Anon-master/anonymizer.py'] K=10 Begin to K-Member Cluster based on NCP getting this output but after this no output is shown and cmd is basically stuck after this output. Please help me

qiyuangong commented 5 years ago

Hi @aryansoni1108 It is not stuck! You didn't get output because this clustering based algorithm is too slow (single core single thread). It requires nearly 3 hours on my laptop (2017 macbook pro 15 inch). You can achieve better performance with optimized clustering algorithm. Or, you can get result in shorter time with less data (1000 records of adult data) or larger k (20 or 50).

Adult data
['anonymizer.py']
K=10
Begin to K-Member Cluster based on NCP
NCP 11.20%
Running time 10744.34seconds
dataExperimenter2019 commented 5 years ago

Hi, question which adult and informs datasets is actually used by the anonymiser.py? I want to try to cut down the processing time.

I've been running the algorithm (on informs) for past 4 hours with the data as is (from the gitHub downloads) and it still hasn't finished :(

Also, where can I find the optimized clustering algorithm on gitHub?

Thank you!

qiyuangong commented 5 years ago

Hi, the datasets are placed in data dir. The adult.data is for adult dataset, while conditions.csv and demographics.csv are for Informs dataset.

About optimized clustering algorithm, I think you can start from optimized k-means clustering. Search these keywords with search engine, such as Google.

FarihaHossain commented 5 years ago

Hi @qiyuangong , This is a great initiative. Appreciatable. ddddd While compiling the anonymizer.py this problem is occurring. Can you please help regarding this.?

qiyuangong commented 5 years ago

Hi @FarihaHossain I think this error is caused by IS_CAT mismatch with ATT_TREES. It seems that in a given attribute, it should be categoric attribute, but it is actually NumRange.

Can you give me the detailed running command?

shivjais13 commented 5 years ago

Hi, the algorithm works fine with adult data and it produced the result in 3 hours, but its running for a day and haven't produced any output for the INFORM dataset for k = 20. [python2 anonymizer.py i kmember 20] The above was the code i used in the terminal and its stuck on Begin to k-member cluster based on NCP from past 20 hours. Can you suggest an update or anything i can do to produce a result.

FarihaHossain commented 5 years ago

Hi @FarihaHossain I think this error is caused by IS_CAT mismatch with ATT_TREES. It seems that in a given attribute, it should be categoric attribute, but it is actually NumRange.

Can you give me the detailed running command?

hi, thanks for the reply I just cloned this repository and run it. Nothing changed and this problem came out.

qiyuangong commented 5 years ago

Hi @FarihaHossain I think this error is caused by IS_CAT mismatch with ATT_TREES. It seems that in a given attribute, it should be categoric attribute, but it is actually NumRange. Can you give me the detailed running command?

hi, thanks for the reply I just cloned this repository and run it. Nothing changed and this problem came out.

Hi, I run it on my env. Things go on well exception an saving error related to INFORM dataset.

Can you give me more details about your environment and running commend ?

qiyuangong commented 5 years ago

Hi, the algorithm works fine with adult data and it produced the result in 3 hours, but its running for a day and haven't produced any output for the INFORM dataset for k = 20. [python2 anonymizer.py i kmember 20] The above was the code i used in the terminal and its stuck on Begin to k-member cluster based on NCP from past 20 hours. Can you suggest an update or anything i can do to produce a result.

Well, I will add some output (maybe a progress bar) about that.