Closed iparask closed 6 years ago
Thanks Giannis. I notice that what is under validation_suite should be under src/training. The R scripts in the home dir also belong in src/training. (if Bento agrees) I understood validation_suite to be scripts checking that test outputs are as expected.
I was not sure what the scripts that @bentocg had there were doing, so I placed them somewhere based on my understanding.
I will move them under src/training
Hello Brad. I agree that the validation suite should be scripts and expected data output not for every part of the development. In my opinion that includes the pipeline script, the kernels and any script that is developed to support this use case.
Do you agree?
@iparask This is a case where validation has two meanings. First, it is a formal ML term used in model training: Training Dataset: The sample of data used to fit the model. Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
Second is our validation_suite (maybe it should be verification_suite) on the predict pipeline. Here we want to compare non-EnTK output with the EnTK pipeline output using matching input test datasets. Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.
validation/verification_suite can also contain unit tests for both the pipeline and kernels such as 1) are all resources available? 2) Is directory structure correct?
Does that clarify? It seems to be consistent with issue #17 Create a validation suite?
This PR closes #25. I was not sure what to do with the code that still exists in the home directory. Please either suggest where to move it or feel free to do it.
Also, I would suggest that you check it out and see if I removed something that I shouldn't have removed.