Another way to hide the data from "bad guys" is to change data labels / outputs. So even if the bad guys got the data, they don't know if a label Y is a true one, or we replaced it with a random number.
Our labels are house prices in California. We can replace some house prices with a random value from another row, with a chance of X%. Other labels stay as they are in the dataset.
Again, repeat the graph from #2 of model performance vs. data size, but add more lines for different values of X% label swapping. Find the highest X% that does not drop model performance too much.
Steps:
[ ] (@tamiratGit) Create ELM that replaces X% of outputs Y with a value from another random row of Y. We can replace with a random number, but taking a random number from all Y values keeps the distribution the same.
[ ] (@tamiratGit) Run experiments and draw a few lines on the graph of model performance vs. data size, for different values of X% random replacements of Y values.
[ ] (@tamiratGit) Find the highest value of X that keeps model performance close to its original one. We will use it in the next experiments and in paper reporting.
Another way to hide the data from "bad guys" is to change data labels / outputs. So even if the bad guys got the data, they don't know if a label Y is a true one, or we replaced it with a random number.
Our labels are house prices in California. We can replace some house prices with a random value from another row, with a chance of X%. Other labels stay as they are in the dataset.
Again, repeat the graph from #2 of model performance vs. data size, but add more lines for different values of X% label swapping. Find the highest X% that does not drop model performance too much.
Steps: