Closed slinnarsson closed 8 years ago
Proposed design:
In the loom file, store a schema JSON object as the HDF5 attribute schema
on group /
.
The value of schema
is a string
in JSON format, giving the types of each attribute and of the main matrix. Example:
schema = {
"matrix": "float32",
"row_attrs": {
"GeneName": "string",
"Chromosome": "string",
"Position": "int",
"GC_Percent": "float64"
},
"col_attrs": {
"CellID": "string",
"Tissue": "string",
"Total_Molecules": "int",
"Class": "string"
},
}
schema
which returns the schema as a Python object with properties matrix (str)
, row_attrs (dict)
and col_attrs (dict)
.When serialized to JSON, the schema is included separately (camelCased for JavaScript):
fileinfo = {
"project": "Published datasets",
"dataset": "filename.loom",
"filename": "filename.loom",
"shape": [1200,4500],
"zoomRange": [0,8,16],
"fullZoomHeight": 54745,
"fullZoomWidth": 43657,
"rowAttrs": ...,
"colAttrs": ...,
"schema": {
"matrix": "float32",
"rowAttrs": {
"GeneName": "string",
"Chromosome": "string",
"Position": "int",
"GC_Percent": "float64"
},
"colAttrs": {
"CellID": "string",
"Tissue": "string",
"Total_Molecules": "int",
"Class": "string"
},
}
}
matrix
is float32
. The only valid types for attributes are float64
, int
and string
.Nice! The loom spec should probably include this documentation on the schema, no?
Actually, I have a related request regarding that, so I'll open a separate ticket.
I updated the spec and also the loompy
(Python package) documentation.
Add type information (i.e. a schema) to the loom file.
Benefits: