Closed dehays closed 6 months ago
Thank you @dehays for creating this issue. Should I add it to the December sprint?
Ongoing -- some of these changes have been addressed. Adding type and part_of values. There is some data_object_set curation that needs to happen still.
@dehays let me know if you'd like me to move this to the February sprint or if you plan to close it by Monday. Thank you!
Moving to February sprint but let me know if it should go to backlog instead @dehays
@dehays any update on this issue?
Moving to March but please let me know if this is done or not being actively worked on @dehays
Moved to April - good to have - but has medium priority. Imagine this starts as a Jupyter notebook.
Discussed with David and he'll close or move to May.
@dehays said he will close or work on in May so I'm moving to May sprint.
@emileyfadrosh moving this to the backlog. Let me know if it should be assigned to someone else. FYI @dehays
Closing this issue from 2021. @dwinston @emileyfadrosh FYI Backlog cleanup 12-2023
We should have referential integrity and type (constrained vocabulary) checking at document creation/insertion time.
The latter could be handled by JSON Schema from the NMDC Schema (only permit type values from a defined enumeration).
The portal ingest is mapping from Mongo document collections to Postgres relational tables - and is therefore more aware of missing and extra record references. These could be handled with FK constraints, but then many document inserts as Postgres records would fail. So in lieu of FK constraints, the ingest logs the missing and unreferenced documents. Mostly these are missing and extra data objects - but there is a biosample in there too.
Task here is to consider how to handle the following: