overture-stack / SONG

Metadata management and automated validation system
https://www.overture.bio/products/song
GNU Affero General Public License v3.0
10 stars 4 forks source link

Make FileType validations configurable in dynamic schemas #852

Open joneubank opened 2 months ago

joneubank commented 2 months ago

Summary of request

It is desired to have dynamic schemas define the file types that can be submitted for each type of analysis. As a practical example, the file types submitted to sequencing experiments (BAM) are not the same as for variant calls (VCF), and if a new analysis schema was created for Volumetric Imaging files another file type would be required.

Details

Currently the allowed file types are an arbitrary list defined in the base schema. With this feature request, this would be changed such that:

Desired solution

  1. Modify the Base Schema such that:
    • fileType is its own definition
    • The only validation for fileType is that it is a string
    • file.fileType references the fileType definition
    • fileType is a required property of file
  2. Modify the Analysis Type Registration schema such that:
    • If the user provides a definition for file.fileType it will merge with the definition from the base schema.
    • This will allow the user to add additional restrictions on file.fileType for this analysis type

I am not sure if the current schema merging process will allow type definitions to be extended/merged in this way. Hopefully some quick trials will determine if this is possible.

If not possible in this way, we should discuss other mechanisms for adding fileType validation rules to a schema. It is always an option that the user provides these additional validations separately from the dynamic schema definition and we programatically add those rules into the merged analysis schema.

Additional context

Once this process is modified, we may also need to make some minor updates to related processes:

justincorrigible commented 3 weeks ago

note re definiton merging: it seems SANBI has run into some issues trying to do that, and they have kindly documented their findings here. Sharing their ticket here in case it helps save some research