naobservatory / mgs-pipeline

MIT License
3 stars 1 forks source link

Dates in nonstandard formats #6

Open dp-rice opened 1 year ago

dp-rice commented 1 year ago

When parsing in metadata_samples.json, I've noticed some of the date are not parseable by datetime.date, e.g.:

  "ERR1224351": {
    "country": "Thailand",
    "date": "Summer 2013",
    "location": "Bankok",
    "reads": 95437411
  },

It would be nice if these were standardized in some way, either as valid dates or in some other way I can parse and connect to the prevalence data.

jeffkaufman commented 1 year ago

Not sure the best way to represent these. I think the problem was that some were from June and some were from August, and I couldn't figure out from their metadata which were which. I could split the difference and say "2013-07", or just "2013"?

dp-rice commented 1 year ago

Not sure if it's worth it, but you could introduce a "season" field if this happens often enough and make date a year.

mikemc commented 1 year ago

Not sure if it's worth it, but you could introduce a "season" field if this happens often enough and make date a year.

Unless/until a 'season' field seems obviously useful, you might follow this suggestion of doing whatever you need to do to keep date as a well defined format, which here might be setting it to the year, and using a general-purpose 'notes' field to record that the sample is from June or August