monarch-initiative / pheval

A framework for empirical evaluation of phenotype matching and prioritisation
https://monarch-initiative.github.io/pheval/
Apache License 2.0
12 stars 1 forks source link

311 implement a prepare corpus command #312

Closed yaseminbridges closed 4 months ago

yaseminbridges commented 4 months ago

Implemented a prepare-corpus command.

A user can specify if they intend to carry out variant, gene, and/or disease analysis with their prepared corpus. The command will check the corpora and ensure that all required fields in the phenopacket are filled (required for matching results to known entities in the benchmarking). When a phenopacket is removed from the final corpus a warning message is logged, informing of the phenopacket that was removed.

There is also the option of spiking VCF with variant information from a phenopacket and updating the gene identifiers.

For example, to prepare the phenopacket-store corpus for our use (variant analysis only) we would use the command:

pheval-utils prepare-corpus --phenopacket-dir /path/to/phenopacket-store --gene-identifier ensembl_id --variant-analysis --template-vcf /path/to/template.vcf --output-dir /path/to/output_dir

This will:

  1. Check that all vcf fields in the phenopacket are filled and remove those with missing values from the final corpus.
  2. Update all the identifiers in the remaining phenopackets to ensembl
  3. Spike variants into the template VCF
yaseminbridges commented 4 months ago

Hold off on merging this until #314 is checked and merged

julesjacobsen commented 4 months ago

Hi @yaseminbridges I pushed the 'convert to draft' link under the reviewers names in the 'reviewers' panel at the top right of this. If you're ready for review, press the 'Ready fro review' button in the merge checks box below this comment.

yaseminbridges commented 4 months ago

Hold off on merging this until #314 is checked and merged

Hi @yaseminbridges I pushed the 'convert to draft' link under the reviewers names in the 'reviewers' panel at the top right of this. If you're ready for review, press the 'Ready fro review' button in the merge checks box below this comment.

Hi @julesjacobsen, this is ready