ome / design

OME Design proposals
http://ome.github.io/design/
1 stars 15 forks source link

Initial proposal for storing features in the IDR #40

Closed manics closed 7 years ago

manics commented 8 years ago

cc @joshmoore @simleo

simleo commented 8 years ago

All looks good from my standpoint. Although this is a general framework proposal, I think it's useful to compare it to our current use case. The structure of the current feature record is defined in https://github.com/simleo/pydoop-features/blob/master/src/main/avro/featureset.avdl.

Data: The feature set consists of 2919 values distributed across 214 vectors (the array<double> entries). Note that WND-CHARM outputs a single vector with 2919 components, which is broken down into sub-vectors by pydoop-features. The only scalar features are the gini_coefficient_* ones (although they are represented as 1-length vectors for consistency/simplicity). The data type is double for all features, although this is probably the result of a generalization step within WND-CHARM (i.e., looking at the output records, it seems that many sub-vectors can be safely cast to integer arrays).

Metadata: All remaining fields represent metadata:

manics commented 8 years ago

x, y, w, h: rectangular image tile this record refers to

Is this best stored as a ROI?

joshmoore commented 8 years ago

Is this best stored as a ROI?

Think it depends on the definition of "best". I don't see anyway to avoid duplicating some of the fields between postgresql and HDF5. If we think anyone will want to reference the tile from just the table, I'd say let's do both.

manics commented 8 years ago

Added suggestion that first ID column should be the "most relevant" 074d512

manics commented 8 years ago

Namespace/version added

joshmoore commented 8 years ago

Vetoes to merging this as a proposal?

joshmoore commented 7 years ago

Merging (not tagging for IDR but this is roughly the state of 0.2.1/0.3.0-rc1)