Closed liberaliscomputing closed 4 years ago
@nicholasvk and I did the initial pass of GF modeling together and found out the following issue:
size
, we decided to use DocumentReference.content.attachment.size
. The IG Publisher threw an error while validating an example resource: our usual GF size overflows this field's range limit because it has unsignedInt
as a data type which ranges between 0 and 2,147,483,647. To see how other institutions handle this issue, we investigated the following IGs:content.attachment
. I wonder how they handle really big HTS Files.size
whose data type is integer
. The only difference between integer
and unassignedInt
is that the former allows negative numerics (therefore, from −2,147,483,648 to 2,147,483,647). Thus, this still cannot handle our genomic files.DocumentReference.type
. There is no official extension about this from the FHIR registry. Phenopackets also uses this attribute, binding a custom ValueSet called HTS Format. It only covers some fraction of file formats in the KF dataservice. We can import and extend it, but it doesn't cover data types. Or, we can take either of the following approaches:2.1 Consider DocumentReference.type
as a mixture of file format and data type
We can create the following CodeSystem which combines file format (as code) and data type (as display):
{
"concept":[
{
"code":"BAI",
"display":"Aligned Reads Index"
},
{
"code":"BAM",
"display":"Aligned Reads"
},
{
"code":"CRAI",
"display":"Aligned Reads Index"
},
{
"code":"CRAM",
"display":"Aligned Reads"
},
{
"code":"DCM",
"display":"Radiology Images"
},
{
"code":"FASTQ",
"display":"Unaligned Reads"
},
{
"code":"gVCF",
"display":"gVCF"
},
{
"code":"MAF",
"display":"Annotated Somatic Mutations"
},
{
"code":"PDF",
"display":"Gene Fusions"
},
{
"code":"PDF",
"display":"Radiology Reports"
},
{
"code":"RSEM",
"display":"Expression"
},
{
"code":"SVS",
"display":"Histology Images"
},
{
"code":"TBI",
"display":"gVCF Index"
},
{
"code":"TBI",
"display":"Variant Calls Index"
},
{
"code":"TSV",
"display":"Somatic Copy Number Variations"
},
{
"code":"TSV",
"display":"Gene Expression"
},
{
"code":"VCF",
"display":"Annotated Somatic Mutations"
},
{
"code":"VCF",
"display":"gVCF"
}
]
}
This way, we can curate both file format and data type together. However, this approach has two problems that:
BAM
's being paired with Aligned Reads
is obvious. For example, however, PDF
's essence has nothing to do with Gene Fusions
.TBI
, TSV
, etc.More importantly, we won't be able to pass the IG Publisher validation since each concept entry's code should be unique within a CodeSystem.
2.2 Alternative ways
DcoumentReference.type
. This way, we don't have to worry about the issue above. One issue, though, is that putting a file format as a type may not be an intended use of this attribute. The current draft PR (#191) temporarily takes this approach.DcoumentReference.type
. Then, use content.format
and bind the file format CodeSystem to another new ValueSet, and, in turn, bind this ValueSet to content.format
.file-type
which has two sub-attributes, file-format
, and data-type
. Make two CodeSystems and ValueSets, one for file format and the other for data type, and bind the ValueSets to the sub-attributes respectively.For any approaches of 2.2, we need canonical codes and displays for file format and data type respectively.
Re @allisonheath @baileyckelly
large-size
where the data type is decimal
. This extension will be bound to Attachment
.Re Data type / file format for kfdrc-genomic-file
, during the standup on 09-16-2020, we temporarily decided:
DocumentReference.content.format.display
.The followings are the data types that I cannot easily map to NCIt codes:
"Aligned Reads Index"
DocumenceReference.type.coding
?:"Aligned Sequence Read"
"Index"
"Expression"
"Gene Expression"
?"Expression"
"gVCF"
"Variant Call File Format"
?"gVCF Index"
DocumenceReference.type.coding
?:
"Index"
"Histology Images"
DocumenceReference.type.coding
?:"Histology"
"Image"
"Simple Nucleotide Variations"
"Single Nucleotide Variant"
?"Radiology Images"
DocumenceReference.type.coding
?:"Radiology"
"Image"
"Radiology Reports"
DocumenceReference.type.coding
?:"Radiology"
"Report"
"Variant Calls"
"Single Nucleotide Polymorphism"
?"Variant Calls Index"
DocumenceReference.type.coding
?:"Single Nucleotide Polymorphism"
"Index"
"Isoform Expression"
DocumenceReference.type.coding
?:"Isoform"
"Expression"
"Somatic Copy Number Variations"
DocumenceReference.type.coding
?:
"Somatic Structural Variations"
Re Data type for kfdrc-genomic-file, during the call on 09-21-2020, we temporarily decided to put the above-unmapped enumerations to DocumentReference.type.text
without system
and code
.
Develop and test Kids First genomic file conformance resources. Consider the following steps: