kids-first / kf-api-dataservice

:file_cabinet: Primary API for interacting with the Kids First data
http://kf-api-dataservice.kidsfirstdrc.org
Apache License 2.0
5 stars 2 forks source link

✨ New sample-relationships API #665

Closed znatty22 closed 3 months ago

znatty22 commented 3 months ago

Motivation

CBTN has complex sample data where a group of samples actually have a hierarchical or tree like structure (Blood -> White Blood Cells -> DNA, RNA). There is currently no way to capture this information in Dataservice and therefore no way to get the information into the FHIR service and the Portal.

Approach

Add a SampleRelationship table to capture the sample tree. The table has the following columns:

Add validation rules in the API:

API The endpoint and search parameters follow the same pattern as all other endpoints. Here are the examples

# Get all samples
/sample-relationships

# Get all samples in a study
/sample-relationships?study_id=foo
chris-s-friedman commented 3 months ago

If a sample is a parent of a child sample, then the reverse relationship cannot exist (e.g. if the relationship SA1 -> SA2 exists, then SA2 -> SA1 cannot be created)

Just to clarify, the -> directionality of the arrow is meaningful here, correct? The arrow means parent -> child. So what this validation is validating is that:

If

Then we are validating that a new record cannot be SA2 is parent and SA1 is child, correct?

znatty22 commented 3 months ago

If a sample is a parent of a child sample, then the reverse relationship cannot exist (e.g. if the relationship SA1 -> SA2 exists, then SA2 -> SA1 cannot be created)

Just to clarify, the -> directionality of the arrow is meaningful here, correct? The arrow means parent -> child. So what this validation is validating is that:

If

  • SA1 is the parent
  • SA2 is the child
  • there is a record in sample_relationship where SA1 is parent and SA2 is child

Then we are validating that a new record cannot be SA2 is parent and SA1 is child, correct?

Correct, we're validating that a child sample cannot also be the parent of its own parent sample, and a parent sample cannot also be the child of its own child sample. Basically, we're avoiding cycles

znatty22 commented 3 months ago

Some things I'm curious about:

  1. can a sample have multiple parents? (or are we explicitly allowing/ disallowing that?)
  2. if the answer to the multiple parent question above is "No" then why is this not just a column in the sample table itself?
  1. No a sample cannot have multiple parents - good catch. I might need to add a validation rule for this or change the unique constraint to be on the child_id column

  2. I don't have great answer for this, but the short answer is that we could add the parent_id to the sample table itself but other things get complicated and break our current Dataservice patterns. The the easiest way to implement self-referential data structure is with the secondary many-to-many table. There are definitely other approaches including adding the foreign key to sample to the sample table.