opensafely-core / cohort-extractor

Cohort extractor tool which can generate dummy data, or real data against OpenSAFELY-compliant research databases
Other
38 stars 13 forks source link

Dates in `feather` output format are output as strings, not datetimes #546

Closed alexwalkerepi closed 3 years ago

alexwalkerepi commented 3 years ago

The intention was to output them in datetime format to reduce memory use and eliminate the need for later conversion.

evansd commented 3 years ago

Can you post links to a reproducing example? Because I suspect this is only happening in some instances, not in general.

alexwalkerepi commented 3 years ago

This branch did it in dummy data at least. I've not checked whether it did the same in the EMIS backend when I ran it - It's possible that the manual retyping I did wouldn't have thrown an error. https://github.com/opensafely/long-covid/tree/148a50ca4681e089bf9c9706d9843c257e563e7a

evansd commented 3 years ago

We've discovered the reason for this is that date columns in dummy data have already been converted to strings by the time they get to the dataframe_to_file method: https://github.com/opensafely-core/cohort-extractor/blob/75ae8b53e7bdc6f611887a14b51d0ba84dd523ab/cohortextractor/study_definition.py#L296-L302

Not quite sure what the simplest way of dealing with this is at the moment.

alexwalkerepi commented 3 years ago

I've just done a real data extract outputting data in .dta format (this commit), and can confirm that while the dummy data output the date columns as strings, the real data has dates in Stata date format.