materialsproject / maggma

Building blocks for scientific data pipelines
https://materialsproject.github.io/maggma/
Other
38 stars 32 forks source link

JSONStore: read_json_file enhancements #1001

Open rkingsbury opened 1 month ago

rkingsbury commented 1 month ago

JSONStore uses the read_json_file method during connect to aggregate contents of one or more json files into a MemoryStore. This PR contains several quality of life improvements to make this method more reliable and easier to debug.

Changes

  1. Currently, if any of the json files is improperly formatted or otherwise causes an error, connect fails and it is very difficult to determine the reason. This PR the read_json_file call in a try/except block and adds logging messages that make it easier to tell which file is causing the problem, and what the error is.
  2. Currently, read_json_files uses zopen to access each file in read-write mode. This is unnecessary and can contradict the spirit of the read_only kwarg. This PR change the zopen call to read-only.
  3. If a JSON file contents do not contain a last_updated field, one will now be added (and set to the current time)
  4. If a JSON file DOES contain a last_updated field but was serialized using monty dumpfn, the field will be a complete dict version of a datetime object, e.g.
    "last_updated": {
      "@class": "datetime",
      "@module": "datetime",
      "string": "2024-08-07 21:24:31.370707+00:00"
    },

    This PR adds logic to read_json_files that will replace last_updated with just the contents of string, because that is what maggma expects.