monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

isoage #70

Closed pnrobinson closed 5 months ago

pnrobinson commented 6 months ago

pandas is not doing this correctly. for 73 days, we get thisP73DT0H0M0S but we want something that is easier to understand such as P2M13D, and we do not want the time part of it.

The pandas documentation also says this:

The longest component is days, whose value may be larger than 365. Every component is always included, even if its value is 0. Pandas uses nanosecond precision, so up to 9 decimal places may be included in the seconds component. Trailing 0's are removed from the seconds component after the decimal. We do not 0 pad components, so it's ...T5H..., not ...T05H...

pandas does this conversion automatically: https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.isoformat.html

td = pd.Timedelta(days=days) iso = td.isoformat() # returns ISO 8601 duration string: td = pd.Timedelta(days=10350); td.isoformat(); 'P10350DT0H0M0S'

Todo -- let's make a testable class that does this, then we can change to a library if it turns out there is a good one. This is a bit messy but not absolute rocket science, and we do not need to worry about the approximations.