Layout of Auxiliary Data in ASDF file

Format of auxiliary data (XML vs JSON)

XML

Pros

Consistent with format used for earthquake (QuakeML) and station (StationXML) information. This is important for end users accessing the ASDF file directly.
Easy to include units via attributes

Cons

More difficult to serialize/deserialize than JSON.
No native support for arrays. Storing as key/value pairs would require sorting if stored internally in arrays.

JSON

Pros

Easy to serialize/deserialize
Support for arrays (useful for spectral acceleration and Fourier amplitude spectra)

Cons

Including units requires extra key/value pairs or embedding in key names.
Inconsistent with format of earthquake and station metadata.

I strongly prefer XML over JSON for consistency in the ASDF layout even if it adds some complication to reading/writing. In the long term it is much easier to change software interfaces than migrate data files to new formats.

Note: Even if we don't support units in the Python code, we can hardcode the units when writing and validate when reading.

Station Metrics

StationMetrics (group) -> NET.STA (group) -> NET.STAEVENTIDTAG (dataset)

NET: FDSN network code (or equivalent) STA: Station code EVENTID: ComCat event id (or equivalent) TAG: Tag associated with processing to compute metrics

Store the station metrics as key/value pairs with units as attributes.

<station_metrics>
  <hypocentral_distance units="km">10.2</hypocentral_distance>
  <epicentral_distance units="km">2.3</epicentral_distance>
</station_metrics>

Waveform Metrics

WaveformMetrics (group) -> NET.STA (group) -> *NET.STA.LOCSTARTENDWTAGTAG** (dataset)

NET: FDSN network code (or equivalent) STA: Station code LOC: Location code START__END: Time history start/end tags from Waveforms dataset. WTAG: Tag associated with waveform processing TAG: Tag associated with computing metrics

We do not include the channel code, because many metrics involve multiple channels (horizontal components). Instead the components are included in the metrics as attributes.

Store the waveform metrics as key/value pairs grouping first by intensity metric type (for example RotD50) and then intensity metric (for example PGA). Include units via attributes.

<waveform_metrics>
  <rot_d50>
    <pga units="m/s**2">0.45</pga>
    <sa percent_damping="5.0" units="g">
      <value period="2.0">0.2</value>
  </rot_d50>
  <maximum_component>
  </maximum_component>
</waveform_metrics>

Alternative "array format" would be

<value><period>2.0</period><amplitude>0.2</amplitude></value>

It is easier to pull out the amplitude for a specific period if it is stored as an attribute (first case).

usgs / groundmotion-processing