monarch-initiative / genophenocorr

Genotype Phenotype Correlation
https://monarch-initiative.github.io/genophenocorr/stable
MIT License
4 stars 1 forks source link

Use *SUOX* cohort as a real-life test data, add JSON (de)serialization #157

Closed ielis closed 1 month ago

ielis commented 1 month ago

Let's use SUOX cohort as a real-life test data to simplify development of the analysis functions.

The PR adds SUOX cohort into tests/test_data/SUOX.json. The JSON file is created using Python's builtin json module with help from encoders and decoders from the new genophenocorr.io package. Thanks to JSON file, we do not need to ping any REST API to create a Cohort that is ready for analysis.

The PR further simplifies the state of Cohort. The counts of variants, transcripts, phenotypes, diseases, etc. are calculated on the fly instead of precalculating, to simplify JSON (de)serialization.

@pnrobinson this PR adds realistic test data to help you develop the MTC code. Please make yourself a test class like TestCohortAnalysis and request the fixtures such as the SUOX cohort here or a FULL HPO here.

@pnrobinson you probably do not need to check all lines of code, but pls be advised about the SUOX cohort and the full HPO. Using real-life data should simplify your work on the mtc.

Please let me know if you'd like me to set up other type of data (e.g. counts).