Location identifiers - Githubissues

krischer commented 7 years ago

Discussion branched off #2. Concerns DRAFT20170622.

@krischer

Regarding the location identifiers: would it be possible to define some semantics for it? The current state is honestly confusing. But maybe this is not the right place to discuss this.

@chad-iris

Yes, and I think we should define at least some convention but maybe even a required use semantics. The specification is probably the right place for this.

Suggestions welcome.

krischer commented 7 years ago

The definition in draft DRAFT20170708 is:

Network code: Uniquely identifies the owner and network operator responsible for the data. This identifier is assigned by the FDSN. Must not exceed 8 characters.
Station code: Uniquely identifies a station within a network. Must not exceed 8 characters.
Location code: Uniquely identifies a group of channels within a station, for example from a specific sensor or sub-processor. Must not exceed 8 characters.
Channel codes: A sequence of codes that identify the sensor, band and orientation that is either 3 or 4 characters. See Section 6: Definition of channel codes for more information.

Judging from this the main purpose of the location code seems to be to distinguish different sensors or sensor groups within one station. I assume this also means for example large-N experiments with all sensors fairly close in space that could be considered a single station.

As a user I typically want a certain piece of data from a station. Let's assume I want the "best" broadband seismic sensor a station has available. What I currently do is that I query at the channel level (and exploit the channel code semantics) to get a list of all possible channels + location codes I could consider. Then I usually just sort the location codes and choose the alphabetically lowest code. The last part is arbitrary and I would like to see it replaced by something better. I'm not entirely sure how this "something better" looks to be honest.

The most logical thing I can think of would be to move some semantics of the channel code to the location code. E.g. :BH.E would seem reasonable to me (especially with the new location code being optional). If there are multiple sensor capable of producing BH data than it could be :BH_0.E, :BH_1.E, and so on, alphabetically ordered with the first one being the "preferred" one similar to some things in QuakeML. What exactly preferred means it delegated to the data provider but it at least gives some hints to users. A big upside of this is also that now the location explicitly groups the channels.

If the location code is used to distinguish many different sensors in a station (the previously mentioned large-N scenario) it could be some kind of special prefix, like :N1200_BH.E. This would mean sensor number 1200. This also feels ugly and I would rather push this kind of thing to the station code.

Maybe the location code could state that all sensor in that location should have the same physical location with some scale dependent margin?

A simple, and maybe more practical, alternative is to just keep doing what we are currently doing but explicitly state that for any given channel code the alphabetically lowest location code yields the preferred sensor for that type of data.

None of these is really satisfactory but maybe its a starting point.

crotwell commented 7 years ago

Agree this is a painful issue, the current usage is a random mess. It is not even true currently that the 3 channels from a single sensor use the same loc code. For some time, passcal temp networks were setting the loc code to be the stream number on the das, so you had 01.BHZ, 02.BHN and 03.BHE. Yes, really! :( :( :(

The current state is that a "location identifier" doesn't even identify a location and is almost completely lacking in any standard meaning. It is just a namespace, nothing more.

Way back, the precursor to fdsn stationxml had a <channelGroup> element that grouped channels that went together as the 3 components of motion. And before that the Fissures/DHI framework had a ChannelGroup object that did the same as well as a ThreeComponentSeismogram that reflected this idea of keeping the components of motion of the data together as well. Alas, mapping existing data to them was too hard, and so they all fell by the wayside.

I am in favor of making loc code mean something, and for rules that allow 3 components of motion to be grouped and for users to find the "best" set for a station, but I am not optimistic given past history.

krischer commented 7 years ago

Agree this is a painful issue, the current usage is a random mess. It is not even true currently that the 3 channels from a single sensor use the same loc code. For some time, passcal temp networks were setting the loc code to be the stream number on the das, so you had 01.BHZ, 02.BHN and 03.BHE. Yes, really! :( :( :(

We should really involve the PASSCAL guys into the identifier discussion as the new identifiers must be able to accommodate active source data without issues. @chad-iris Does that to some extent happen within IRIS?

Way back, the precursor to fdsn stationxml had a element that grouped channels that went together as the 3 components of motion. And before that the Fissures/DHI framework had a ChannelGroup object that did the same as well as a ThreeComponentSeismogram that reflected this idea of keeping the components of motion of the data together as well. Alas, mapping existing data to them was too hard, and so they all fell by the wayside.

I am in favor of making loc code mean something, and for rules that allow 3 components of motion to be grouped and for users to find the "best" set for a station, but I am not optimistic given past history.

The location code could be renamed to sensor code but that might never get accepted. A sensor thus incorporates all X components of a seismometer and the various decimated versions of it. Then one could assign some of the channel code semantics to the new sensor code. But this is probably too radical and will not happen.

chad-earthscope commented 7 years ago

@chad-iris Does that to some extent happen within IRIS?

Not really, I can forward them the identifier discussion points and issues. To be honest what we have discussed in terms of identifier change is a whole sub-discussion that will likely evolve when more people are exposed to the conversation, there may be more people interested in that than the low level disk format.

crotwell commented 7 years ago

@krischer

The location code could be renamed to sensor code but that might never get accepted. A sensor thus incorporates all X components of a seismometer and the various decimated versions of it. Then one could assign some of the channel code semantics to the new sensor code. But this is probably too radical and will not happen.

It is even worse than that... for stations using STS-1, each component actually comes from a separate sensor. So the 3 that go together would have different sensor codes.

The real world is a complex place!

iris-edu / mseed3-evaluation

Location identifiers #8