tamiratGit / FedELM

1 stars 0 forks source link

4. Label noise in Y #4

Open akusok opened 10 months ago

akusok commented 10 months ago

Another way to hide the data from "bad guys" is to change data labels / outputs. So even if the bad guys got the data, they don't know if a label Y is a true one, or we replaced it with a random number.

Our labels are house prices in California. We can replace some house prices with a random value from another row, with a chance of X%. Other labels stay as they are in the dataset.

Again, repeat the graph from #2 of model performance vs. data size, but add more lines for different values of X% label swapping. Find the highest X% that does not drop model performance too much.

Steps: