Define ontology - Githubissues

something like this? @slobentanzer @scottgigante-immunai Regardless of how we decide to resolve this issue, I'm sure we can already many items we can define.

Originally posted by @rcannood in https://github.com/openproblems-bio/website/issues/247#issuecomment-1538772548

For instance:

Common dataset workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  normalization:::group
  dataset_processors:::group
  raw_dataset["Raw dataset"]:::anndata
  common_dataset[Common<br/>dataset]:::anndata
  dataset_loader[/Dataset<br/>loader/]:::component
  subgraph normalization [Normalization methods]
    log_cpm[/"Log CPM"/]:::component
    l1_sqrt[/"L1 sqrt"/]:::component
    log_scran_pooling[/"Log scran<br/>pooling"/]:::component
    sqrt_cpm[/Sqrt CPM/]:::component
  end
  subgraph dataset_processors[Dataset processors]
    pca[/PCA/]:::component
    hvg[/HVG/]:::component
    knn[/KNN/]:::component
  end
  dataset_loader --> raw_dataset --> log_cpm & l1_sqrt & log_scran_pooling & sqrt_cpm --> pca --> hvg --> knn --> common_dataset

Task-specific benchmarking workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  common_dataset[Common<br/>dataset]:::anndata
  dataset_processor[/Dataset<br/>processor/]:::component
  solution[Ground-truth]:::anndata
  masked_data[Input data]:::anndata
  method[/Method/]:::component
  control_method[/Control<br/>method/]:::component
  output[Prediction]:::anndata
  metric[/Metric/]:::component
  score[Score]:::anndata
  common_dataset --> dataset_processor --> masked_data
  dataset_processor --> solution
  masked_data --> method --> output
  masked_data & solution --> control_method --> output
  solution & output --> metric --> score

Discussion

However, this workflow might not be applicable for all tasks.

Multimodal datasets will have to be processed differently to regular unimodal datasets
Some tasks don't really have a ground-truth and instead rely on internal scores. IMO these "benchmarks" should not be a part of OpenProblems, since it doesn't really count as a benchmark.

openproblems-bio / website

Define ontology #249

Common dataset workflow

Task-specific benchmarking workflow

Discussion