Epic - Separate Donor/Specimen/Sample model from Analysis model

Summary

Song should be usable for files which do not use the Donor/Specimen/Sample model. The current Analysis data model makes these fields a requirement for every submitted analysis, forcing data that does not use these fields to fill in these fields to satisfy the software. Additionally, when multiple analyses are submitted for a single donor, the donor information needs to be repeated for each of these analyses. This causes a duplication of data, and can even suffer from input errors from one analysis to another. There is an additional issue with Song data model for Donors/Specimen/Samples being limited and not customizable to the data that different systems wish to collect.

Overture is developing towards having separate services for tracking structured data, see Lectern and Lyric. These services provide the ability to track any data model for Donors, registration of their Specimen and Samples, and any other clinical or phenotypical data that is relevant to a study. This frees up Song to focus on being the service to track Analysis meta-data.

In order to connect the Analysis data with related structured data, a system can include a field in their Dynamic Schema which will provide an ID to link this analysis to the data tracked in lyric. This mapping becomes fully customizable through the Dynamic Schema definition. With this change, we will need to provide a mechanism for Song to check with an external service to validate that the provided value in one of these fields is registered with an external ID/data service.

Song as ID Service

One feature of Song that is lost by this change will be the use of Song to generate system wide unique IDs for Donors, Specimen, and Samples. Song has previously had the option to work as an ID server, generating unique IDs for these entities the first time an analysis that referenced them was submitted. As these entities will no longer be a standard part of the Analysis model, there is no mechanism for identifying Donors/Speciment/Samples and so they will no longer be tracked by Song's database, and no Donor/Specimen/Sample IDs will be generated by Song.

Most cases where Song has been used do not require this feature. A separate ID service has been used to create system wide IDs. This is either because of a federated data management model that has multiple Song instances, or because clinical data records are being registered and tracked separate from the file data. This will standardize this process for all future instances of Song.

Implementation Details

Updating the Data Model

To accomplish this change we need to remove these entities from several parts of the code base, and update some functionality of the Song Server.

[ ] https://github.com/overture-stack/SONG/issues/864
[ ] https://github.com/overture-stack/SONG/issues/865
[ ] Update AnalysisService interface and implementations to not include any references to Donor, Specimen, or Sample specific code.
- Includes updating PayloadConverter
[ ] Remove Donor, Specimen, and Sample Controllers
- Can also remove the Services which resolve the requests made to these controllers if the services are not referenced anywhere else.
[ ] Remove Donors from responses in StudyController
- Only one use of StudyWithDonors class exists, it can be removed
[ ] Update Search interfaces to not include search by donor_id, specimen_id, sample_id, or the submitter variant of these.
- idSearch in AnalysisController
- Also remove from the Search Command in the song-client that references this API
Miscellaneous Other Changes:
- [ ] Server Errors referencing these entities not existing can be removed

Preserving Legacy Schemas

[ ] Update the legacy analysis types to include sample property as it was used in the original base schema
- can reuse the definitions for donor, specimen, and sample that will be removed from the base schema

External Validation of Dynamic Schema Properties

In the current implementation, when Song was not used as an ID service, Song was able to fetch system wide IDs for Donors/Specimen/Samples via an HTTP request to a configurable ID server. With these entities no longer being part of the coded data model there is no longer a need to fetch IDs for these entities - Song will not be recording them.

However, the external ID check was used to ensure that entities had been registered with an ID system before accepting Analysis registration with Song. This is a feature we want to re-implement, but will now need to be part of Dynanic Schemas.

Additional context

The intention of this change is to simplify data management with Song, with two cases in particular:

When Donor data is stored in another system, it is frequently duplicated into Song.
1. This can cause replication errors where the data in Song and in the official clinical data source do not match. This inconsistent data becomes an issue when researchers try to analyze the inconsistent data.
2. Duplicated data in two sources has to be carefully and consistently managed when combining the clinical data source with Song, for instance when packaging data into downloadable archives or when indexing data for search purposes. This is technical complexity that doesn't need to exist.
Song was maintaining extra APIs for managing Donors/Specimen/Samples that were actually quite challenging (if not impossible) to use for modifying and correcting incorrectly submitted data. While losing this functionality is regrettable, it is beneficial to not have to maintain and fix it.

overture-stack / SONG