iris-edu / stationxml-validator

GNU General Public License v3.0
16 stars 8 forks source link

disallow Decimation in non-digital stage #112

Open crotwell opened 4 years ago

crotwell commented 4 years ago

Related to rules 404 and 423. A stage with Decimation probably should be a digital stage, meaning it must:

Here is an example of a channel that I believe incorrectly has Decimation in stage 1, US.AGMN..BHN but is not flagged as invalid by the validator (v 1.7.1):


          <Stage number="1">
            <PolesZeros>
              <InputUnits>
                <Name>m/s**2</Name>
                <Description>Acceleration in Meters Per Second Per Second</Description>
              </InputUnits>
              <OutputUnits>
                <Name>V</Name>
                <Description>Volts</Description>
              </OutputUnits>
              <PzTransferFunctionType>LAPLACE (RADIANS/SECOND)</PzTransferFunctionType>
              <NormalizationFactor>24595700000000.0</NormalizationFactor>
              <NormalizationFrequency>1.0</NormalizationFrequency>
              <Pole number="0">
                <Real>-981.0</Real>
                <Imaginary>-1009.0</Imaginary>
              </Pole>
              <Pole number="1">
                <Real>-981.0</Real>
                <Imaginary>1009.0</Imaginary>
              </Pole>
              <Pole number="2">
                <Real>-3290.0</Real>
                <Imaginary>-1263.0</Imaginary>
              </Pole>
              <Pole number="3">
                <Real>-3290.0</Real>
                <Imaginary>1263.0</Imaginary>
              </Pole>
            </PolesZeros>
            <Decimation>
              <InputSampleRate>100.0</InputSampleRate>
              <Factor>1</Factor>
              <Offset>0</Offset>
              <Delay>0.0</Delay>
              <Correction>0.0</Correction>
            </Decimation>
            <StageGain>
              <Value>1.02</Value>
              <Frequency>1.0</Frequency>
            </StageGain>
          </Stage>
          <Stage number="2">
            <Coefficients>
              <InputUnits>
                <Name>V</Name>
                <Description>Volts</Description>
              </InputUnits>
              <OutputUnits>
                <Name>counts</Name>
                <Description>Digital Counts</Description>
              </OutputUnits>
              <CfTransferFunctionType>DIGITAL</CfTransferFunctionType>
              <Numerator number="0">1.0</Numerator>
            </Coefficients>
            <Decimation>
              <InputSampleRate>100.0</InputSampleRate>
              <Factor>1</Factor>
              <Offset>0</Offset>
              <Delay>0.0</Delay>
              <Correction>0.0</Correction>
            </Decimation>
            <StageGain>
              <Value>419430.0</Value>
              <Frequency>0.0</Frequency>
            </StageGain>
          </Stage>
timronan commented 4 years ago

I'm a little bit confused about the exact rule being proposed. At the top of the issue you state:

A stage with Decimation probably should be a digital stage

and list a set of rules for what defines a digital stage including :

not have PolesZeros with PzTransferFunctionType of LAPLACE (RADIANS/SECOND) as this implies an analog stage other elements that only make sense in an analog stage

Your example is an analog stage based on PzTransferFunctionType of LAPLACE (RADIANS/SECOND)

<PzTransferFunctionType>LAPLACE (RADIANS/SECOND)</PzTransferFunctionType>

Do you want us to included a rule that states:

IF Stage[N]:PolesZeros:PzTransferFunctionType:LAPLACE (RADIANS/SECOND) or Stage[N]:OutputUnits:Name != Count then Stage[N]:Decimation must not be included

Is this true for all cases?

This would be rule 424, it doesn't affect rule 404 or 423.

Thanks for the clarification.

crotwell commented 4 years ago

Yes, I think that rule would do it

I am pretty sure this rule is correct in all cases, but might be worth asking others. I can't think of a case where a stage should have a Decimation but is not a digital (outputunit=counts) stage.

There may be other indicators of an analog stage that should have similar rules, but I have not explored further.

I was just mentioning that 404 and 423 were related, not meaning this should change them.

timronan commented 4 years ago

We have come up with three rules to cover this suggestions:

Rule 424) IF Stage[N]:Decimation is included THEN Stage[N]:OutputUnits:Name must equal count(s)

Rule 425) IF Stage[N]:PolesZeros:PzTransferFunctionType:LAPLACE (RADIANS/SECOND) OR Stage[N]:PolesZeros:PzTransferFunctionType:LAPLACE (HERTZ) OR Stage[N]:CoefficientsType:CfTransferFunctionType:ANALOG (RADIANS/SECOND) OR Stage[N]:CoefficientsType:CfTransferFunctionType:ANALOG (HERTZ) are included then Stage[N]:Decimation must not be included

Rule 406) Stage[LAST]::OutputUnits:Name must be assigned count(s)

Does it seem like there rules adequately cover this issue?

crotwell commented 4 years ago

Those all sound reasonable to me and I think cover the cases I was worried about.

timronan commented 4 years ago

This issue has been completed when pull request #113 was accepted. We have not released an updated jar file, but will provide jar file to test this and the other stationxml-validator issues that you have recently reported.

crotwell commented 4 years ago

Moving conversation from #116

Question, does the fact that this (made up) response has both input and output units of celsius imply that software can ignore the details of the response and "see that no conversion is needed at all."?

I am totally fine with punting on the count validator rule, but relying on input units == output units to imply response == unity seems dangerous to me. Just my $0.02.

BTW, see https://github.com/FDSN/StationXML/issues/22 which might help if you actually want a "no response" response.

<Stage number="1">
<PolesZeros>
<InputUnits>
<Name>CELSIUS</Name>
<Description>temperature in degrees Celsius</Description>
</InputUnits>
<OutputUnits>
<Name>CELSIUS</Name>
<Description>temperature in degrees Celsius</Description>
</OutputUnits>
<PzTransferFunctionType>LAPLACE (RADIANS/SECOND)</PzTransferFunctionType>
<NormalizationFactor>6.33476E8</NormalizationFactor>
<NormalizationFrequency>5.0</NormalizationFrequency>
<Zero number="0">
<Real>0.0</Real>
<Imaginary>0.0</Imaginary>
</Zero>
<Zero number="1">
<Real>0.0</Real>
<Imaginary>0.0</Imaginary>
</Zero>
<Zero number="2">
<Real>0.0</Real>
<Imaginary>0.0</Imaginary>
</Zero>
<Pole number="0">
<Real>-0.6283</Real>
<Imaginary>0.0</Imaginary>
</Pole>
<Pole number="1">
<Real>-6.283</Real>
<Imaginary>0.0</Imaginary>
</Pole>
<Pole number="2">
<Real>-6.283</Real>
<Imaginary>0.0</Imaginary>
</Pole>
<Pole number="3">
<Real>-145.13</Real>
<Imaginary>-60.125</Imaginary>
</Pole>
<Pole number="4">
<Real>-145.13</Real>
<Imaginary>60.125</Imaginary>
</Pole>
<Pole number="5">
<Real>-60.125</Real>
<Imaginary>-145.13</Imaginary>
</Pole>
<Pole number="6">
<Real>-60.125</Real>
<Imaginary>145.13</Imaginary>
</Pole>
</PolesZeros>
<Decimation>
<InputSampleRate>5120.0</InputSampleRate>
<Factor>1</Factor>
<Offset>0</Offset>
<Delay>0.0</Delay>
<Correction>0.0</Correction>
</Decimation>
<StageGain>
<Value>80600.0</Value>
<Frequency>5.0</Frequency>
</StageGain>
</Stage>
chad-earthscope commented 4 years ago

Question, does the fact that this (made up) response has both input and output units of celsius imply that software can ignore the details of the response and "see that no conversion is needed at all."?

I am totally fine with punting on the count validator rule, but relying on input units == output units to imply response == unity seems dangerous to me. Just my $0.02.

When you put it that way. So more directly: if the stored data (output units) are already in units you can use no conversion is needed at all. Documenting the output units as count ruins this when they could be other units. Which still leaves me with: what is the value of describing data as count when it can be described as an Earth unit? Perhaps there is a reason that makes obscuring the units worth it that I can't see.

BTW, see FDSN/StationXML#22 which might help if you actually want a "no response" response.

I wish we had something similar now. An even simpler approach for these non-response or scaling-only response cases is to not have any <Stage>'s at all. The <InstrumentSensitivity> has all that is needed, units and scaling.

timronan commented 4 years ago

The change in units represents data transformations that occur in during data collection. The SEED manual describes a response cascade as "Most seismic systems can be regarded as cascades of stages — for example, a seismometer, followed by an amplifier, followed by an analog filter, followed by an analog/ digital converter, followed by a digital filter" (SEED Manual 151). Removing the concept of count for "scaling-only response cases" hides the fact that the data is stored on a datalogger and that an analog to digital conversion occurred. The first example provided by @chad-iris in issue #116 seems to be an accurate representation of how the data is transformed from a physical earth unit to a digital representation of the data. In this first example, Sensor:Description is labeled as "Quanterra 330 Linear Phase Composite". Data are stored on Quanterra 330s as counts reference and not as CELSIUS. If we wanted to store these example units as CELSIUS, than provenance information would need to be added representing the conversion from raw data, stored on the Quanterra 330, into scaled data in earth units. This true for unity scaling, stages represent portions of the physical seismic system and eliminating stages for convenience makes the metadata inaccurate.

The second example posted by @chad-iris in issue #116 is metadata for synthetic data, so the data was generated rather than collected by a seismic system. In this synthetic example, no instrument filtering or analog to digital conversions occurred so it is accurate to have input and output units as meters.

metempleton commented 4 years ago

I won't pretend that I'm up on all of the nuances in this thread yet, but there are some flags I want to raise for consideration before anything is decided.

I agree that assuming input units=output units in a filter stage equates to no data transformation is dangerous. Keep in mind that:

With respect to a "no response" response that includes only for things like SOH - we still need a response element that describes sample rate.

Thanks Tim for summarizing the conversation above - that helped me immensely. I'm completely on board with example one and everything you said about it.

I'm uneasy with example two as far as the output units go. The response as is tells me that this data is unsampled analog data (but I have no idea how you'd store that digitally). What I think is more likely the case is that this data is discrete (in counts and needs a sample rate description) and has scaling Y counts = 1 M. But I can't tell what that scaling is and it would matter if I tried to plot the trace or manipulate the trace with other data.

There was a bruhaha with responses for the DTCC SmartSolo nodes this spring that is now ok. The manufacturer samples and decimates the data, then applies a gain such that 1 count is equivalent to 1 microV in the end. But they place all of this scaling at the ADC stage in their nominal response (so I did, too). PASSCAL was alarmed that DTCC data was output "in microV" and applied their own scaling with PIC software (also used in Europe) to remove DTCC's post-ADC scaling to make the output "in counts". Consequently, nominal responses didn't describe the DTCC+3rd party amplitudes. This headache started by the misconception that digital data from this instrument wasn't output in counts. So I would say that if someone wants to talk about data being in some kind of earth units for research purposes, no problem there. But for response purposes, digital data is in counts even if it scales 1:1 to an earth unit. Anything else creates confusion.

crotwell commented 4 years ago

But for response purposes, digital data is in counts even if it scales 1:1 to an earth unit. Anything else creates confusion.

Exactly how I feel!!!

metempleton commented 4 years ago

Ok, Chad and I were just talking and I'll try to summarize - this will cause some rethinking of my previous comment and past response practice.

After discussion, I'm now convinced that units of counts are meaningful when continuous data pass through an element that samples amplitudes in uniform size intervals (counts). For 24-bit ADCs with an input range of 40Vpp, each amplitude bin (count) is (40V / 2^24counts) = 2.384 microvolts wide. So output units of counts makes sense.

For the synthetic trace, your amplitude resolution is only limited to your computational limits - sort of an infinitely small count size - and the "input range" is unlimited, so it's closer to an (infinity / infinity) case, which doesn't lend itself well to units of counts. So maybe units that I associate with continuous properties could be appropriate.

We talked about opening up consideration of how responses are described in metadata that don't necessarily conform to past SEED practice. Questions like:

Chad is going to generate some StationXML examples of this to see what software can be broken.

No doubt this is the beginning of lengthy ongoing discussion with many people that would shake things up... Thoughts are welcome!

crotwell commented 4 years ago

@metempleton I think some of your comments got mangled, xml elements in markdown disappear unless you put them in back quotes.

Probably this is right for synthetic data where the miniseed is floating point. But if it is integer to take advantage of compression, then you are likely not actually in real world units, rather "gained" real world units. In that case I would much rather see a m/s -> count stage so it is obvious how things work. A m/s -> m/s stage with a non-unity gain is confusing. The units looks like you don't need the response but you actually do.

So perhaps this suggests an rule that if input units == output units and both are real world (non-count), then gain must be 1 and there cannot be any frequency varying response. It would be real handy to have the Unity response type to make this more obvious.

Also, I am suspicious of very many recorded in field, non-synthetic, non-SOH channels that would actually be in real world units since the unit coarseness is unlikely to match the recorded coarseness for ints and floats tend not to compress well. Find an example and I'll eat the electrons in my post. :)

chad-earthscope commented 4 years ago

Here is a sensitivity-only response for SY_COL with no <Stage>s and only an <InstrumentSensitivity>. For comparison, the same "response" with a single stage mirroring the <InstrumentSensitivity> for SY_COLA. The upshot from testing so far is that evalresp does nothing with the sensitivity-only documentation, which is not surprising.

Following @metempleton description, describing the output of an electronic ADC process as "counts" makes sense, when the ADC step is described.

I remain on the fence regarding any value in calling samples "counts" when the ADC step is not described. For example in the VKI channel example, where the information about the ADC is not present and the gain has been applied back to the data. Sure the data are "counts" as it's easy to guess they have been through an ADC even if we known nothing about it. They are also "ºC", which is readily usable by consumers and valuable information.

I maintain that there are plenty of data needing description in StationXML that have not gone through a process similar to an ADC and the concept of digitizer "counts" does not apply (e.g. synthetics, displacement-grams derived from GNSS data, digitized paper recordings), but ends up being used as a catch-all. I do not see the value in calling such data "counts" in cases where they can be described in other, more useful units.

timronan commented 4 years ago

There may be a problem with the purposed rule 424: Rule 424: IF Stage[N]:Decimation is included THEN Stage[N]:OutputUnits:Name must equal count(s)

The decimation block contains delay and correction. Can there be an estimated timing delay/correction in an analog stages?

If it is acceptable to have timing delay/correction in updated stages the updated rule could read:

Would it be more accurate for this rule to read: Rule 424: IF Stage[N]:Decimation:Factor != 1 THEN Stage[N]:OutputUnits:Name must equal count(s)

crotwell commented 4 years ago

I tend not to think of timing delays in analog stages, but this is a bit outside my area.

@metempleton do you konw if this can happen?

timronan commented 4 years ago

I have been working with the Magnetotelluric group to write a response and came across:

Blockette 57 includes only non-decimated stages (thus decimation factor is unit, and decimation offset is zero for all channels). Typical part of the *.hed file produced by nimsreadz for MT1 NIMS data used to fill out the decimation_57 table is shown below: Magnetic field - 3 pole Butterworth LOWPASS corner PERIOD GROUP DELAY (sec): Hx 0.5 0.159 Hy 0.5 0.159 Hz 0.5 0.159 Electric field - 5 pole Butterworth LOWPASS corner PERIOD GROUP DELAY (sec): Ex 0.5 0.2575 Ey 0.5 0.2575 Electric field - 1 pole Butterworth HIGHPASS corner PERIOD TIME CONSTANT (sec): Ex 37699 6000 Ey 37699 6000

Where the group delay seems to represent B57:F7 and the Butter-worth filters seem to be applied to an analog stages. I am still getting spun up on MT, but this may be a case of timing delays in analog stages. I am going to get some further clarification from the MT group.

I'm definitely open to any thoughts, because I am unsure of the correct answer.

metempleton commented 4 years ago

Hmmmm, I’m not that knowledgable about analog filter group delays.
They are different than FIR delays. FIR filters are weighted averages of samples before and after the sample of interst. The datalogger retains a sample and its subsequent samples until it has enough samples to apply the entire filter. Then it corrects the time stamp.

I think of group delays as being functions of frequency, but it’s also the delay of the amplitude envelope across the filter. I don’t recall enough about MT to know whether thinking about it as a time series makes sense or not. I would be interested in what you find out about whether they apply or just report the group delay and if they apply it, what does that look like. Is it an analog phase filter, a time tag adjustment after the data gets digitized, or what? I could ask around the PIC if that’s helpful.

I’m not sure I’m convinced yet that a B057 is the right description yet.
I haven’t heard of a case in seismology where we have corrected for an analog filter group delay.

cheers, Mary

On Nov 6, 2020, at 2:28 PM, timronan notifications@github.com wrote:

I ask this because I have been working with the MagnetoTelluric group to write a response and came across:

Blockette 57 includes only non-decimated stages (thus decimation factor is unit, and decimation offset is zero for all channels). Typical part of the *.hed file produced by nimsreadz for MT1 NIMS data used to fill out the decimation_57 table is shown below: Magnetic field - 3 pole Butterworth LOWPASS corner PERIOD GROUP DELAY (sec): Hx 0.5 0.159 Hy 0.5 0.159 Hz 0.5 0.159 Electric field - 5 pole Butterworth LOWPASS corner PERIOD GROUP DELAY (sec): Ex 0.5 0.2575 Ey 0.5 0.2575 Electric field - 1 pole Butterworth HIGHPASS corner PERIOD TIME CONSTANT (sec): Ex 37699 6000 Ey 37699 6000

Where the group delay seems to represent B57:F7 and the Butter-worth filters seem to all be applied to analog stages. I am still getting spun up on M,T but this may be a case of timing delays in analog stages. I am going to get some clarification from the MT group.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.