monarch-initiative / pheval

A framework for empirical evaluation of phenotype matching and prioritisation
https://monarch-initiative.github.io/pheval/
Apache License 2.0
12 stars 1 forks source link

346 duckdb for benchmarking #348

Closed yaseminbridges closed 3 days ago

yaseminbridges commented 1 month ago

Unfortunately a very large PR, however, it is for a much-needed and positive upgrade to the benchmarking code.

Before:

Now:

This change was necessary, the overall benchmarking process is now more efficient in terms of speed, storing of data & code readability (also managed to cut down the codebase).

Finally, I have added some much-needed documentation for running a benchmark.

A general idea of how things work now:

  1. Run configuration YAML file is consumed and parsed - providing all information required for benchmarking run(s).
  2. Unique corpus of phenopackets is parsed and relevant gene/variant/disease information (if specified) is stored in a duckdb table (TABLE A).
  3. Depending on whether gene/variant/disease benchmarking is specified for that run configuration, the standardised TSV result is read in as a duckdb table (TABLE B) and used to compare against what is known from the phenopacket (retrieved from TABLE A) and the rank is retrieved and added to TABLE A.
  4. Summary stats, plots, and comparisons between runs are generated from TABLE A
yaseminbridges commented 1 week ago

This PR fails due to the MakeFile pipeline test that executes the template runner and the benchmark command. The MakeFile could be separated out of the main PhEval repo and moved into monarch-pheval - @souzadevinicius agreed on this statement in another PR. Ultimately this PR is blocked until the test is refactored in the MakeFile pipeline OR this is moved to another repo to test the pipeline.

souzadevinicius commented 1 week ago

This PR fails due to the MakeFile pipeline test that executes the template runner and the benchmark command. The MakeFile could be separated out of the main PhEval repo and moved into monarch-pheval - @souzadevinicius agreed on this statement in another PR. Ultimately this PR is blocked until the test is refactored in the MakeFile pipeline OR this is moved to another repo to test the pipeline.

The pipeline test needed to be modified due to the pheval-utils methods refactor. Sorry for this PR blocking. @yaseminbridges, @julesjacobsen and @matentzn, wherever you decide to put the Makefile code, I agree 😅.