monarch-initiative / setsim

A proof of concept of the summing similarity measure.
1 stars 0 forks source link

Add CLI entry point to sumsim #9

Open ielis opened 1 year ago

ielis commented 1 year ago

We need a CLI entry point in the sumsim library.

Setting up the entry point is described in setuptools documentation.

We can use

The entry point should look something like this:

sumsim bench --hpo path/to/hp.json --phenopackets path/to/phenopacket/dir --output /path/to/output.csv

--hpo takes a path to HPO JSON --phenopackets takes a path to a folder with Phenopacket JSON files. The code can expect that phenopackets are the only files in the folder --output where to write the table with disease ranks

The CLI can take other options as necessary (e.g. a table with precomputed term IC values?)

I think, we can make our life simpler if we make sumsim bench create a table with ranks of all phenopackets:

subject_id  disease_id,      p_val    score    whatever
patient_a   OMIM:256000      0.0001   12.3     blabla
patient_a   OMIM:123456      0.001     10.3    other
patient_a   OMIM:234567      0.212     12.3    blabla
...
patient_b   OMIM:256000      0.0001   12.3     blabla
patient_b   OMIM:111111      0.3254   12.3     blabla

Diseases for a patient are sorted such that the most likely disease is at the top. The table is a stack of sub-tables with diseases per patient

I think CSV is the best format since it is has out of the box support by pandas.