sennetconsortium / ingest-api

MIT License
0 stars 0 forks source link

Coordinate with Gesina to get `subtype` from Cedar API call #233

Closed maxsibilla closed 5 months ago

maxsibilla commented 9 months ago

Currently when we validate metadata we have no way to ensure that the metadata file matches the Sample type via the portal. This branch (https://github.com/hubmapconsortium/ingest-validation-tools/commit/1f8a1605c6b14f6a83e3997fe68747faaa5f6df2) does not yet support returning from Cedar which subtype (e.g. Block, Suspension, Section) the metadata file was validated against.

libpitt commented 9 months ago

Also: https://github.com/sennetconsortium/ingest-validation-tools/pull/34#issuecomment-1836374367

Note for later: iv_utils.get_schema_version(upload.get('fullpath'), encoding='ascii', globus_token=get_groups_token())

libpitt commented 9 months ago

Note this comment for later:

Okay so get_schema_version should allow you to retrieve the more specific sample sub-type the same as any other schema. A sample would get diverted to get_other_schema_name which reads the TSV and looks for a sample_id field (as well as unique fields for organ, contributors, and antibodies). If it finds a sample_id it looks for a type in the TSV and returns. It does assume there is a type field, so if that’s not always the case this is broken. This could use another guard against bad types, going to push a small update. Let me know if this has anything to do with what you’re hoping to accomplish or if you have any sanity checks about the logic.

maxsibilla commented 5 months ago

Calling get_schema_version with

iv_utils.get_schema_version(<full_path>, "ascii", token)

returns something like this: image

Calling get_other_schema_name with

iv_utils.get_other_schema_name(<get_schema_version.rows>, <full_path>)

Currently only returns "sample". I am unsure if this is how schema_name gets set in the return from get_schema_version. Is the response from get_other_schema_name being set to "sample" because of how this "other_types" is formatted? image