sdss / astra

Analysis framework for SDSS-V/Milky Way Mapper
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

`DataProduct` uniqueness #8

Closed andycasey closed 2 years ago

andycasey commented 2 years ago

Each astra.database.astradb.DataProduct record should refer to a unique file path.

The fields are release, filetype (which set the template to be used by Tree), and kwargs for the data model keywords. The dictionary of kwargs can be stored as a JSONField, but we can't create a unique index on a JSONField. And it gets worse, because similar-looking kwargs could refer to the same path because of how the types are resolved by the Tree product. For example:

kwargs_1 = {
  "mjd": 59012
  ...
}
kwargs_2 = {
  "mjd": "59012"
  ...
}

If all other fields were the same type and value, kwargs_1 and kwargs_2 would reference the same file path but would be considered unique rows.

We should enforce some kind of checking to make sure that each DataProduct refers only to a unique path, without specifically having to store the path as a field.

andycasey commented 2 years ago

Fixed this by forcibly setting the type of each value in DataProduct.kwargs, and specified a unique identifier DataProduct.kwargs_hash which is automatically calculated at creation time.