lter / Clim-HydroDB-2.0

Material related to converting the original climHydroDB into CUAHSI ODM
8 stars 3 forks source link

Decrease DataValues table up to 1/3 original size using relations #25

Open kzollove opened 2 years ago

kzollove commented 2 years ago

The DataValues table is the largest table in a hymetDP-formatted Data Package since it carries the actual data values for the hymet time series.

Along with the splitting mentioned in #24, another method that we could decide to support is adding a SeriesID column to DataValues to relate it back to the SeriesCatalog. With this relation in place, we could remove 5 columns (VariableCode, SiteCode, SourceCode, MethodsCode, QualityControlLevelsCode) from DataValues. This would make data Values ~ 2/3 its original size.

To make DataValues table ~1/3 of its original size, we could augment the Series Catalog to include ValueAccuracy, UTCOffset, OffsetValue, OffsetTypeCode, CensorCode, and QualifierCode, and remove these from the DataValues table as well. This would have repercussions for the value of SeriesCatalog, which currently serves as summary information for the data package (i.e. has a ValueCount column that would be affected if new combinations are added to the definitions of a "Series")

Similarly, we could create a new table "DataValuesAncillary" where we store unique combinations of those 11 columns and relate the DataValues table to this ancillary table. This new table would allow us to maximally decrease DataValues size without muddying SeriesCatalog.