kids-first / kf-model-fhir

🔥 FHIR Data Model for Kids First
https://kids-first.github.io/kf-model-fhir/ig/
Apache License 2.0
1 stars 1 forks source link

Develop and test Kids First genomic file conformance resources #187

Closed liberaliscomputing closed 4 years ago

liberaliscomputing commented 4 years ago

Develop and test Kids First genomic file conformance resources. Consider the following steps:

liberaliscomputing commented 4 years ago

@nicholasvk and I did the initial pass of GF modeling together and found out the following issue:

  1. Size To map GF's size, we decided to use DocumentReference.content.attachment.size. The IG Publisher threw an error while validating an example resource: our usual GF size overflows this field's range limit because it has unsignedInt as a data type which ranges between 0 and 2,147,483,647. To see how other institutions handle this issue, we investigated the following IGs:
  1. File format and data type To map these fields into DocumentReference, we decided to use DocumentReference.type. There is no official extension about this from the FHIR registry. Phenopackets also uses this attribute, binding a custom ValueSet called HTS Format. It only covers some fraction of file formats in the KF dataservice. We can import and extend it, but it doesn't cover data types. Or, we can take either of the following approaches:

2.1 Consider DocumentReference.type as a mixture of file format and data type

We can create the following CodeSystem which combines file format (as code) and data type (as display):

{
  "concept":[
    {
      "code":"BAI",
      "display":"Aligned Reads Index"
    },
    {
      "code":"BAM",
      "display":"Aligned Reads"
    },
    {
      "code":"CRAI",
      "display":"Aligned Reads Index"
    },
    {
      "code":"CRAM",
      "display":"Aligned Reads"
    },
    {
      "code":"DCM",
      "display":"Radiology Images"
    },
    {
      "code":"FASTQ",
      "display":"Unaligned Reads"
    },
    {
      "code":"gVCF",
      "display":"gVCF"
    },
    {
      "code":"MAF",
      "display":"Annotated Somatic Mutations"
    },
    {
      "code":"PDF",
      "display":"Gene Fusions"
    },
    {
      "code":"PDF",
      "display":"Radiology Reports"
    },
    {
      "code":"RSEM",
      "display":"Expression"
    },
    {
      "code":"SVS",
      "display":"Histology Images"
    },
    {
      "code":"TBI",
      "display":"gVCF Index"
    },
    {
      "code":"TBI",
      "display":"Variant Calls Index"
    },
    {
      "code":"TSV",
      "display":"Somatic Copy Number Variations"
    },
    {
      "code":"TSV",
      "display":"Gene Expression"
    },
    {
      "code":"VCF",
      "display":"Annotated Somatic Mutations"
    },
    {
      "code":"VCF",
      "display":"gVCF"
    }
  ]
}

This way, we can curate both file format and data type together. However, this approach has two problems that:

More importantly, we won't be able to pass the IG Publisher validation since each concept entry's code should be unique within a CodeSystem.

2.2 Alternative ways

  1. Make two CodeSystems, one for file format and the other for data type, and bind them to a new ValueSet, and, in turn, bind this ValueSet to DcoumentReference.type. This way, we don't have to worry about the issue above. One issue, though, is that putting a file format as a type may not be an intended use of this attribute. The current draft PR (#191) temporarily takes this approach.
  2. Make two CodeSystems, one for file format and the other for data type, and bind the data type CodeSystem to a new ValueSet, and, in turn, bind this ValueSet to DcoumentReference.type. Then, use content.format and bind the file format CodeSystem to another new ValueSet, and, in turn, bind this ValueSet to content.format.
  3. Create a new extension, say, called file-type which has two sub-attributes, file-format, and data-type. Make two CodeSystems and ValueSets, one for file format and the other for data type, and bind the ValueSets to the sub-attributes respectively.

For any approaches of 2.2, we need canonical codes and displays for file format and data type respectively.

Re @allisonheath @baileyckelly

liberaliscomputing commented 4 years ago
  1. Size: Based on @ShahimEssaid's suggestion, we will create an extension called large-size where the data type is decimal. This extension will be bound to Attachment.
  2. File format / data type: We will move forward as illustrated in 2.2.2, creating new CodeSystems and ValueSets, based on the following, but not limited to, resources:
liberaliscomputing commented 4 years ago

Re Data type / file format for kfdrc-genomic-file, during the standup on 09-16-2020, we temporarily decided:

The followings are the data types that I cannot easily map to NCIt codes:

liberaliscomputing commented 4 years ago

Re Data type for kfdrc-genomic-file, during the call on 09-21-2020, we temporarily decided to put the above-unmapped enumerations to DocumentReference.type.text without system and code.