monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

Work on individual factory and CDA importer #69

Closed ielis closed 6 months ago

ielis commented 6 months ago

Update the CdaImporter pattern and move around the internals of CdaTableImporter without changing semantics of the data transformation.

Simplify CdaIndividualFactory, polish tests.

@msierk, as you point out here I think you are right with the pd.Timedelta function doing a similar conversion, but in the CDA context, we can probably simplify the days -> ISO8601 duration conversion even further, by something like:

days = 123456
iso = f'P{abs(days)}D'

This may be even better than Pandas, because pandas produces P123456TH0M0S0 for the same input. This result, however, increases the precision, which, I think, is incorrect since it is made up.

msierk commented 5 months ago

@ielis @pnrobinson Sorry I'm just seeing this now (I saw Peter's comment about not wanting the time in the ISO format). The problem with the above is we wanted to convert days into years + months. I was thinking the best thing to do is use the pandas conversion and then do a regex to strip off the T and everything after it but I hadn't gotten around to doing it yet...