4. Label noise in Y - Githubissues

Another way to hide the data from "bad guys" is to change data labels / outputs. So even if the bad guys got the data, they don't know if a label Y is a true one, or we replaced it with a random number.

Our labels are house prices in California. We can replace some house prices with a random value from another row, with a chance of X%. Other labels stay as they are in the dataset.

Again, repeat the graph from #2 of model performance vs. data size, but add more lines for different values of X% label swapping. Find the highest X% that does not drop model performance too much.

Steps:

[ ] (@tamiratGit) Create ELM that replaces X% of outputs Y with a value from another random row of Y. We can replace with a random number, but taking a random number from all Y values keeps the distribution the same.
[ ] (@tamiratGit) Run experiments and draw a few lines on the graph of model performance vs. data size, for different values of X% random replacements of Y values.
[ ] (@tamiratGit) Find the highest value of X that keeps model performance close to its original one. We will use it in the next experiments and in paper reporting.

tamiratGit / FedELM

4. Label noise in Y #4