phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
76 stars 29 forks source link

Other related files #95

Closed julesjacobsen closed 5 years ago

julesjacobsen commented 5 years ago

Might be worthwhile adding a non-HTS files to enable other related information such as expression data, GWAS arrays or other panels for epigenetic markers.

julesjacobsen commented 5 years ago

@allisonheath and @RobertJCarroll would this answer your two issues #143 and #144 ?

This would be a File type and use the URI:

// A file of unspecified type.
message File {
    // Full system path to the file. e.g. /data/genomes/file1.vcf.gz
    string path = 1;
    // URI for the file e.g. file://data/genomes/file1.vcf.gz or https://opensnp.org/data/60.23andme-exome-vcf.231?1341012444
    string uri = 2;
    // description of the file contents
    string description = 3;
}

I presume the URI could point to a FHIR resource?

allisonheath commented 5 years ago

I think this is one aspect and general Files are important (although small detail I'd say the example VCF filenames would be considered HtsFile). But towards the referenced issues, was thinking this would be more something you could put inline as evidence and/or a fuller record of the entity.

For example: Expanding the example with uri as evidence (not sure what the id should be in this case?)

{
  "type": {
    "id": "HP:0001558",
    "label": "Decreased fetal movement"
  },
  "classOfOnset": {
    "id": "HP:0011461",
    "label": "Fetal onset"
  },
  "evidence": [{
    "evidenceCode": {
      "id": "ECO:0000033",
      "label": "author statement supported by traceable reference"
    },
    "reference": {
      "id": "???",
      "uri": "http://hapi.fhir.org/baseDstu3/DiagnosticReport/1942712/_history/1"
      "description": "Example FHIR resource where the phenotype may have been derived from."
    }
  }]
}

Or perhaps expanding the patient something like (would likely be better to have it more of a generic name that could be included for different entities, but just to demonstrate):

"subject": {
    "id": "patient1",
    "datasetId": "urology cohort",
    "dateOfBirth": "1964-03-15T00:00:00Z",
    "sex": "MALE",
    "karyotypicSex": "UNKNOWN_KARYOTYPE",
    "patient_profile": "http://hapi.fhir.org/baseDstu3/Patient/1722945/$everything"
}
RobertJCarroll commented 5 years ago

I agree with Allison here. The advantage is being able to explicitly link to parts of the referenced item is most valuable. The natural parallel is annotating a genomics data- I'm adding knowledge/interpretation to something that is documented thoroughly "out there".

julesjacobsen commented 5 years ago

@allisonheath we have an ExternalReference, extending this to include a uri field and mapping that to reference would actually make some sense.

// FHIR mapping: Reference (https://www.hl7.org/fhir/references.html)
message ExternalReference {
    // e.g. ISBN, PMID:123456, DOI:...,
    // FHIR mapping: Reference.identifier
    string id = 1;
    // FHIR mapping: Reference.reference
    string description = 2;
}

This could be added to several message types, but I'm not sure if we'd want it everywhere? Also the idea of the CURIE is precisely this and the CURIE can be used in the id field.

For example in your example:

"subject": {
    "id": "patient1",
    "datasetId": "urology cohort",
    "dateOfBirth": "1964-03-15T00:00:00Z",
    "sex": "MALE",
    "karyotypicSex": "UNKNOWN_KARYOTYPE",
    "patient_profile": "http://hapi.fhir.org/baseDstu3/Patient/1722945/$everything"
}

here the "id": "patient1" is a poor choice of identifier ;) This would be better:

"subject": {
    "id": "HAPI:Patient/1722945",
    "datasetId": "urology cohort",
    "dateOfBirth": "1964-03-15T00:00:00Z",
    "sex": "MALE",
    "karyotypicSex": "UNKNOWN_KARYOTYPE",
}
"metaData": {
    "resources": [{
      "id": "HAPI FHIR server",
      "url": "http://hapi.fhir.org",
      "namespacePrefix": "HAPI",
      "iriPrefix": "http://hapi.fhir.org/baseDstu3/"
    }]
}

So from this CURIE we can generate the URI by splitting the CURI on the : substituting the namespacePrefix with the iriPrefix as defined in the metaData and prepending that to the reference:

 HAPI:1722945 -> http://hapi.fhir.org/baseDstu3/Patient/1722945

This is the same mechanism as used for the ontology identifiers, so in theory anything which has an id field can use this mechanism.

Does this make sense/ work for you?

julesjacobsen commented 5 years ago

Moving this to the next version discussion as there has been no resolution of this yet.

pnrobinson commented 5 years ago

@allisonheath I think that we would prefer to avoid putting references in for CURIE-type identifiers that are used in fields that are intentionally flexible (such as the subject/id). The reason for this is that while having references to ontologies for the terms used in the phenopacket will probably help interoperability, the flexible fields cannot be relied upon to hold computational information, and by putting references into the metadata, we would be inviting some users to put too much semantics into this field. It seems better to consider adding an additionally typed field in the future if there is a general need to do so for certain elements. Since phenopackets are primari8ly intended for external use, it also seems unlikely that we want to share things like this "http://hapi.fhir.org/baseDstu3/DiagnosticReport/1942712/_history/1" (I mean if that was a real FHIR record and not the HAPI FHIR server). I am guessing that internally hospital will use FHIR (hopefully) or some other internal system but that this kind of thing will never make it beyond the fire wall.