nih-cfde / cfde-deriva

Collaboration point for miscellaneous CFDE-deriva scripts
Other
2 stars 3 forks source link

NCPI file derivation breaks if C2M2 filenames are omitted #398

Closed karlcz closed 1 year ago

karlcz commented 1 year ago

The C2M2 file table has the filename column as nullable, but the NCPI file table we derive from it maps the filename field to a non-nullable name column. This causes submissions with omitted filenames to be rejected, though the C2M2 spec says they are allowed.

We need to either exclude these "anonymous" files from the NCPI file table or substitute some other value such as the file local_id. But, for a substitution we also need to consider whether the NCPI name field has any content limitations and require some kind of scrubbing/sanitization of the alternate values. E.g. the C2M2 filename field excludes the use of /, \, and : characters but these are allowed in a local_id.

karlcz commented 1 year ago

The modeling team decided to change the file.filename field to be required (not null), which will also mitigate this bug.

This minor schema change is committed and has been manually tested to verify that it does not disrupt release processing with legacy submissions that specify this field as optional. It will also tolerate resubmission of such, as long as the actual file records always populate the field (as they do in all current DCC submissions).