huawei-lin / GBDT_unlearning

The implementation for paper Machine Unlearning in Gradient Boosting Decision Trees (Accepted on KDD 2023), supporting training and unlearning.
Apache License 2.0
6 stars 0 forks source link

Supplemental request for Data #1

Open zshenobody opened 1 month ago

zshenobody commented 1 month ago

When I tried to replicate the experiment in your paper, I found that there only have two preprocessed datasets "Optdigits", "Pendigits" in the 'data' folder. I'd like to politely ask how you pre-processed your code and the sources of other datasets "HIGGS", "Letter". It would be even more appreciated if you could provide code to preprocess the data.❤️

huawei-lin commented 1 month ago

Hi @zshenobody. The HIGGS dataset was originally from this UCI dataset repo - HIGGS. If it is in libsvm format, you can find a lot of resources about converting libsvm to csv. We did not do any preprocessing in this dataset. The letter dataset should be from UCI dataset repo - Letter Recognition, all we did is to replace the letters A-Z with 0-25.

Please let me know if you have any questions.

Edit: All of our datasets are from UCI datasets or LIBSVM Data: Classification, Regression, and Multi-label.

zshenobody commented 1 month ago

Very timely reply! Thank you very much!