mlcommons / medperf

An open benchmarking platform for medical artificial intelligence using Federated Evaluation.
https://www.medperf.org
Apache License 2.0
143 stars 29 forks source link

[BUG] MLCube input parameters should always be read-only #236

Closed aristizabal95 closed 1 year ago

aristizabal95 commented 2 years ago

Describe the bug Currently, MLCubes can freely access and write to the input data folders. This should not be allowed, as it could cause unexpected behavior if one of those inputs is shared across multiple executions (e.g. same input data folder passed to multiple model mlcubes). This kind of scenarios has already occured with the FeTS Challenge, where one of the models modified the input dataset, causing subsequent models to fail when trying to get predictions on unrelated files. This can also be exploited in challenge scenarios, where competitors can maliciously modify the input data so the next models down the line can't make predictions.

To Reproduce Steps to reproduce the behavior:

  1. Write a model MLCube that accesses the input data folder and modifies one of the input files.
  2. Create results using that MLCube
  3. Create results for the same dataset using another model MLCube
  4. The second model's performance will be affected by the first model's modification

Expected behavior The input data parameters should always be ro, and should fail if an user tries to write/modify anything to those paths.

Additional context The possibility of making input parameters ro has recently been added to MLCube (see related PR) for this specific reason. We can enforce read-only parameters by explicitly adding this option to the Cube.run method: mlcube run --mlcube ... --task=infer -Ptasks.infer.parameters.input.data_path.opts=ro This is something that should be done for all input parameters for all mlcubes to ensure reproducibility. We should talk to the MLCube team to see if there's a way to get all input parameters that doesn't require parsing the mlcube.yaml file, which is prone to change in the future.

If an MLCube needs to store temporary files, this should be done through an output path. We can consider providing MLCubes a tmp output path for the purpose of storing such files.

aristizabal95 commented 2 years ago

Related to #222

hasan7n commented 1 year ago

completed in #284