Consistent with format used for earthquake (QuakeML) and station (StationXML) information. This is important for end users accessing the ASDF file directly.
Easy to include units via attributes
Cons
More difficult to serialize/deserialize than JSON.
No native support for arrays. Storing as key/value pairs would require sorting if stored internally in arrays.
JSON
Pros
Easy to serialize/deserialize
Support for arrays (useful for spectral acceleration and Fourier amplitude spectra)
Cons
Including units requires extra key/value pairs or embedding in key names.
Inconsistent with format of earthquake and station metadata.
I strongly prefer XML over JSON for consistency in the ASDF layout even if it adds some complication to reading/writing. In the long term it is much easier to change software interfaces than migrate data files to new formats.
Note: Even if we don't support units in the Python code, we can hardcode the units when writing and validate when reading.
NET: FDSN network code (or equivalent)
STA: Station code
EVENTID: ComCat event id (or equivalent)
TAG: Tag associated with processing to compute metrics
Store the station metrics as key/value pairs with units as attributes.
NET: FDSN network code (or equivalent)
STA: Station code
LOC: Location code
START__END: Time history start/end tags from Waveforms dataset.
WTAG: Tag associated with waveform processing
TAG: Tag associated with computing metrics
We do not include the channel code, because many metrics involve multiple channels (horizontal components). Instead the components are included in the metrics as attributes.
Store the waveform metrics as key/value pairs grouping first by intensity metric type (for example RotD50) and then intensity metric (for example PGA). Include units via attributes.
Format of auxiliary data (XML vs JSON)
XML
Pros
Cons
JSON
Pros
Cons
I strongly prefer XML over JSON for consistency in the ASDF layout even if it adds some complication to reading/writing. In the long term it is much easier to change software interfaces than migrate data files to new formats.
Note: Even if we don't support units in the Python code, we can hardcode the units when writing and validate when reading.
Station Metrics
StationMetrics
(group) -> NET.STA (group) -> NET.STAEVENTIDTAG (dataset)NET: FDSN network code (or equivalent) STA: Station code EVENTID: ComCat event id (or equivalent) TAG: Tag associated with processing to compute metrics
Store the station metrics as key/value pairs with units as attributes.
Waveform Metrics
WaveformMetrics
(group) -> NET.STA (group) -> *NET.STA.LOCSTARTENDWTAGTAG** (dataset)NET: FDSN network code (or equivalent) STA: Station code LOC: Location code START__END: Time history start/end tags from
Waveforms
dataset. WTAG: Tag associated with waveform processing TAG: Tag associated with computing metricsWe do not include the channel code, because many metrics involve multiple channels (horizontal components). Instead the components are included in the metrics as attributes.
Store the waveform metrics as key/value pairs grouping first by intensity metric type (for example RotD50) and then intensity metric (for example PGA). Include units via attributes.
Alternative "array format" would be
It is easier to pull out the amplitude for a specific period if it is stored as an attribute (first case).