something like this?
@slobentanzer @scottgigante-immunai Regardless of how we decide to resolve this issue, I'm sure we can already many items we can define.
However, this workflow might not be applicable for all tasks.
Multimodal datasets will have to be processed differently to regular unimodal datasets
Some tasks don't really have a ground-truth and instead rely on internal scores. IMO these "benchmarks" should not be a part of OpenProblems, since it doesn't really count as a benchmark.
Originally posted by @rcannood in https://github.com/openproblems-bio/website/issues/247#issuecomment-1538772548
For instance:
Common dataset workflow
Task-specific benchmarking workflow
Discussion
However, this workflow might not be applicable for all tasks.