kids-first / kf-api-dataservice

:file_cabinet: Primary API for interacting with the Kids First data
http://kf-api-dataservice.kidsfirstdrc.org
Apache License 2.0
5 stars 3 forks source link

Consider moving relations to _links #104

Closed dankolbman closed 6 years ago

dankolbman commented 6 years ago

Right now, entity relationships are properties with the kf_id as the value:

{
            "_links": {
                "self": "/participants/EANSE585"
            },
            "created_at": "2018-02-07T19:30:48.095286+00:00",
            "demographic": "/demographics/0A8QGFQX",
            "diagnoses": [
                "J6NPXSD8",
                "H1Y00WE4"
            ],
            "external_id": "participant_4",
            "kf_id": "EANSE585",
            "modified_at": "2018-02-07T19:30:48.095302+00:00",
            "samples": [
                "6AH8HJNK",
                "HEY6S7R1"
            ]
}

Perhaps we should move these into _links with an endpoint, or create a copy there:

{
            "_links": {
                "self": "/participants/EANSE585",
                "demographic": "0A8QGFQX",
                "diagnoses": [
                    "/diagnoses/J6NPXSD8",
                    "/diagnoses/H1Y00WE4"
                ],
                "samples": [
                    "/samples/6AH8HJNK",
                    "/samples/HEY6S7R1"
                ]
            },
            "created_at": "2018-02-07T19:30:48.095286+00:00",
            "external_id": "participant_4",
            "kf_id": "EANSE585",
            "modified_at": "2018-02-07T19:30:48.095302+00:00",

}
dankolbman commented 6 years ago

As we continue building functionality, we will eventually be able to replace the larges lists that result from properties like samples with a single link for that entities samples:

"_links": {
    "samples": [
            "6AH8HJNK",
            "HEY6S7R1",
            ...
            ...
    ]
}

becomes:

"_links": { "samples": "/samples?participant_id=AABB1234" }

znatty22 commented 6 years ago

👍 I like the idea of moving the relations to _links. Perhaps we could do something like, move the related entity's link url to _links and optionally include the fully expanded entity in the properties section of the body. We probably wouldn't do this for all related entities. Maybe for all of or a subset of participant's related entities. An example would be something like:

{
            "_links": {
                "self": "/participants/EANSE585",
                "demographic": "0A8QGFQX",
                "diagnoses": [
                    "/diagnoses/J6NPXSD8",
                    "/diagnoses/H1Y00WE4"
                ],
                "samples": [
                    "/samples/6AH8HJNK",
                    "/samples/HEY6S7R1"
                ]
            },
            "created_at": "2018-02-07T19:30:48.095286+00:00",
            "external_id": "participant_4",
            "kf_id": "EANSE585",
            "modified_at": "2018-02-07T19:30:48.095302+00:00",
            "demographic": {
                  "race": "asian",
                  "ethnicity": "not hispanic or latino", 
                  "gender": "male" 
             }
}
dankolbman commented 6 years ago

This needs to be prioritized higher as the entities with many relationships causes timeouts due to long load times. We should favor using filters here to avoid those large loads, eg:

GET /sequencing-centers/SC_00000000
{
            "_links": {
                "self": "/participants/EANSE585",
                "genomic_files":  "/genomic-files?sequencing-center=SC_00000000"
            }
}
znatty22 commented 6 years ago

Should we do this (represent the link to children using a single URL with parent's kf_id as query parameter) for all entities? Or should we render the link(s) to children based on the number of children? So for example we could have both:

For a sequencing center that has >= 100 biospecimens:

GET /sequencing-centers/SC_00000000
{
    "_links": {
        "self": "/sequencing-centers/SC_EANSE585",
        "biospecimens":  "/biospecimens?sequencing_center_id=SC_00000000"
    }
}

For a sequencing center that has < 100 biospecimens:

GET /sequencing-centers/SC_00000001
{
    "_links": {
        "self": "/sequencing-centers/SC_EANSE585",
        "genomic_files":  [
            "genomic_files/GF_00000000",
            "genomic_files/GF_00000001",
            "genomic_files/GF_00000002",
            ...
            "genomic_files/GF_00000098"
        ]
    }
}
znatty22 commented 6 years ago

For the time being we will display all links to related entities (both parent entities and child entiteis) inside _links.

After talking to @grant-guo, this should not affect ETL since ETL only uses the links to parent entities (foreign keys) and the format of those will not change (they are already included in _links)