monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

Finalize CI #76

Closed ielis closed 5 months ago

ielis commented 5 months ago

The PR works on finalizing tests and re-enabling CI and addresses several points.

Major

Mapping CDA diagnosis to NCIT neoplasm term

@pnrobinson @justaddcoffee the disease term mapping is $\pm$ the same as was previously. I, however, rearranged the tests to only test at one site. Your tests did not disappear, but I removed redundancies and moved them to just one place.

We still need to work on the mapping details, e.g. by updating the cda_to_ncit_map.tsv file, but the mapping should work most of the time.

A question for @pnrobinson - there are 2 mapping files that seem to achieve the same: cda_to_ncit_map.tsv and cda_to_ncit_map_old.tsv (used to be at oncoexporter/cda/mapper/neoplasm_types.tsv). However, the latter seems to be more complete. @pnrobinson is there a difference between the files?

Ensure the driver scripts work

The driver scripts for ETL of Lung and Cervix cohorts should now work. @pnrobinson @justaddcoffee please verify on your end.

Minor

Mapping of days of age to ISO8601 duration

Next, We map the number of days of life into ISO8601 consisting of days only (e.g. 11000 -> 'P11000D') instead of rounding the days to years, months, etc. This is done in order to preserve the data and we can add an age formatter later, if needs be. The days of life mapping also relates to #70, #71 .

Use pytest for new tests, keep existing unittest

We prefer using pytest as the test runner and all new tests should use Pytest. Pytest can, however, handle unittest tests. Therefore, there is no need to upgrade the tests right now.

Fixes #41, #62, #66