micronutrientsupport / database-architecture

The Postgres database code for the MAPS tool
3 stars 0 forks source link

356 create json schemas for import #363

Closed rbroth closed 1 year ago

rbroth commented 1 year ago

I've made a first pass at creating JSON schemas from the markdown tables previously in /doc/import-templates/import-template.md. They're pretty basic, but should allow us to make a start on standardizing data import a bit more.

Note that I haven't tested these against actual data yet, so they may not be accurate

General Checklist

rbroth commented 1 year ago

Just realized that I've used the markdowntables from the db-architecture repo; there are also markdwon tables in the data import repo. Will compare and try to fix

rbroth commented 1 year ago

Ok, Obviously these are a work in progress and we'll work out the kinks as we go

rbroth commented 1 year ago

I've been testing the fooditem schema, and it's looking pretty good! Problems identified: there's an extra column (data_reference_original_id) that I don't believe we're actually importing yet; andsome entries for food_genus_confidence use h/m/l instead of High medium Low.

A problem I've run into is dynamic typing. Because not all strings are quoted, at the moment I'm using pandas to dynamically get the column type. There are also difference in how different libraries handle null values: pandas recognizes NA as null, but most everything else does not. Some libraries import empty cells (i.e. ,,) as empty strings. There's more that can be done, but I think the current implementation is enough to work together with the scientists and see if we can get something going

rbroth commented 1 year ago

@bgsandan are you ok with this being merged?