iris-edu / stationxml-validator

GNU General Public License v3.0
16 stars 8 forks source link

rule 321 does not work #116

Open crotwell opened 4 years ago

crotwell commented 4 years ago

Rule 321 is supposed to warn if a velocity seismometer channel, like BHZ, has input units that are not m/s.

As coded, it only check to see if m/s is contained in the units, which means that units like m/s**2 or Pa/Kg/m/s**2 will pass. This defeats the purpose of the test, which was to help prevent accelerometer data from being used with velocity seismometer channels. I assume the string contains check was done to allow units like nm/s to pass, but this needs to be done more carefully.

The rule should also apply to the instrument sensitivity units, not just the stage 1 input units. Currently it only checks stage 1.

My test was to download a valid stationxml file with an LHZ channel from IRIS, edit the stage 1 input unit to be m/s**2 and run the validator. It wrongly show PASSED.

Changing the units to something crazy like Pa/m/s**2 also passes. Changing the unit to g/s did cause a failure.

crotwell commented 4 years ago

See #96 for background.

crotwell commented 4 years ago

Also, the rule as currently coded, although broken overall, would allow a BNZ channel, which should have units of m/s**2 to have units of m/s.

The accelerometer channels (N and sometimes L) should have units of m/s**2.

Just to be clear, this code:

if("HLN".indexOf(code.charAt(1)) >=0 | "hln".indexOf(code.charAt(1)) >=0) {

should be split into 2 cases, one for H and L that requires m/s and a second one for N that requires m/s**2.

crotwell commented 4 years ago

pull request https://github.com/iris-edu/stationxml-validator/pull/118 has 3 test cases that should fail. Just the xml, I did not add them to the test harness.

timronan commented 4 years ago

Do we want to accept cm/s, mm/s, nm/s ect? Is it acceptable for L to be either m/s or m/s**2 ?

The above examples are slightly contradictory:

LHZ channel from IRIS, edit the stage 1 input unit to be m/s**2 and run the validator. It wrongly show PASSED.

contradicts with

The accelerometer channels (N and sometimes L) should have units of m/s**2.

which of these two cases should I follow?

crotwell commented 4 years ago

Sorry my comment was confusing about ?L? channels, yes they were sometimes m/s**2 in older data, but I think a warning should still be emitted by the validator.

But the L in LHZ example is the band code (first char), but this rule should only deal with the instrument code,(second char).

Current best practice is: for ?H? and ?L? channels to be velocity seismometers, and so input unit should be some type of velocity. Any of m/s, cm/s , nm/s, even km/s are ok. But m/s**2 is not and should trigger a warning.

for ?N? channels, which should be accelerometers, and so they should be some type of acceleration, so m/s**2, cm/s**2 or nm/s**2 are ok. But m/s is not and should trigger a warning.

In all cases a unit like Kg*m/s should fail, so the match needs to be more than just "contains".

Mary mentioned that some older stations (before the introduction of the N instrument code) used L for accelerometers. So there are cases where a ?L? channel will fail the "must be velocity rule", which is partially why this is a warning not an error. But this should not be the case for any recent metadata, nor should a ?N? channels ever be m/s. Mary also said that some people do really weird stuff that might still be technically valid, and so all of this should be warnings. I am ok with that.

crotwell commented 4 years ago

Just to be clear, by

In all cases a unit like Kg*m/s should fail, so the match needs to be more than just "contains".

I mean Kg*m/s should be a warning. This rule never generates errors, unfortunately!

timronan commented 4 years ago

Using this example

LHZ channel from IRIS, edit the stage 1 input unit to be m/s**2 and run the validator. It wrongly show PASSED.

was a typo on my part and @crotwell is correct this should fail.

@crowell does it seems acceptable to have 1 available character before m/s? This accommodates for mm/s, cm/s ect. but triggers on cases like Kg*m/s ect.

Warning 321 should be split into 2 warnings:

321: If Channel:Code[2] == (H |  M) then Stage[1]:InputUnit must equal ?m/s AND Stage[Last]:OutputUnits must equal count?

322: If Channel:Code[2] == N then Stage[1]:InputUnit must equal ?m/s**2 AND Stage[Last]:OutputUnits must equal count?

323: If Channel:Code[2] == L then (Stage[1]:InputUnit must equal ?m/s OR Stage[1]:InputUnit must equal ?m/s**2) AND Stage[Last]:OutputUnits must equal count?

? represents a single character

crotwell commented 4 years ago

Mary said many networks use M for mass position channels, and those would not match this input unit rule. So I would leave out M from this rule.

1 char before the m/s or m/s**2 is probably fine. Even better would be to restrict it to one of the actual SI prefixes. But probably that is not really worth the extra effort.

My preference would be to issue the warning on ?L? channels that are m/s**2. For old data there is nothing to be done, but new L metadata should not be m/s**2. Better IMHO to be too verbose for old data but still warn on new.

So:

321: If Channel:Code[2] == (H |  L) then Stage[1]:InputUnit must equal ?m/s AND Stage[Last]:OutputUnits must equal count?

322: If Channel:Code[2] == N then Stage[1]:InputUnit must equal ?m/s**2 AND Stage[Last]:OutputUnits must equal count?
timronan commented 4 years ago

1 char before the m/s or m/s**2 is probably fine. Even better would be to restrict it to one of the actual SI prefixes. But probably that is not really worth the extra effort.

I can restrict the rule to cm/s, mm/s, um/s, nm/s but I am not going to add this into the rule description in the rules table. Maybe I can make an extended rule description in the drop down but this is otherwise too long of a rule for the table.

timronan commented 4 years ago

This issue has been addressed by commit 2d1f1dca7184ff960672820f01cb9671bb1d5472. Issue must be test when stationxml-validator release candidate is released.

chad-earthscope commented 4 years ago

Not a comprehensive comment on this issue, but re:

Stage[Last]:OutputUnits:Name must equal count

that part goes too far. While count units definitely are digitized data, the converse is not universally true. Digital data can be in Earth units and this would prevent the description of such data in StationXML in their actual units. We already have synthetics where the output data is not count, and it will happen in other cases where a comprehensive response is not known.

This rule would force metadata creators to "hide" the Earth units unnecessarily, and (slightly) devalue the data by obscuring its interpretation. Here is an example of a temperature channel being hidden a COUNTS unit even though the digital data is CELCIUS.

crotwell commented 4 years ago

Not sure I understand what is being hidden, your example looks right to me. What would you do instead?

chad-earthscope commented 4 years ago

Not sure I understand what is being hidden, your example looks right to me. What would you do instead?

Describe the data as CELCIUS directly instead of COUNTS. From a perspective of the output units of this (trivial) response, the data needs to be processed/converted to get back to the Earth units, when in fact it does not. Of course smart people or software can "read around" the COUNTS unit (by noting the scale factor of 1) and ignore it, but that shouldn't be needed. It could be like this.

crotwell commented 4 years ago

I still don't get how your version is any better, it is not shorter, not any less confusing and kind of looks worse to me. Software that wants to be "simple" will only look at the InstrumentSensitivity and do the quick and dirty overall gain and unit conversion. I just don't see how making the output units be anything other than count helps to be honest.

That said, if there is not consensus on how to do things, then maybe the validator shouldn't enforce this as part of the rule.

chad-earthscope commented 4 years ago

Software can be even simpler and see that no conversion is needed at all.

Describing the stored data as count makes that harder to figure out, for both software (and people in my opinion). We have services that apply these conversions and checking that the input and output units are the same is a pretty straightforward way to determine that unpacking, doing simple sensitivity or complex response conversions/restitutions, and repacking the data is unnecessary to deliver in Earth units. Same goes for all user software.

I think count is most valuable as an indication that some processing/conversion and perhaps choices (e.g. frequency rage) are needed to get back to some other units. Which leads me to ask: what is the value of describing data as count when it can be described as an Earth unit?

timronan commented 4 years ago

This discussion belongs to issue #112, not issue #116. Please consider moving this discussion to it's correct issue so we can searchable and robust documentation.

timronan commented 4 years ago

Specify that ? means variable character of length 1 and that * is a variable string of any length in the preamble.