iris-edu / mseed3-evaluation

A repository for technical evaluation and implementation of potential next generation miniSEED formats
3 stars 1 forks source link

Underscores in FDSN identifiers #4

Closed krischer closed 7 years ago

krischer commented 7 years ago

Discussion branched off #2. Concerns DRAFT20170622.

@krischer

Maybe allow underscores and dashes in all the codes? Might be useful for example for somewhat semantic names of sensors arranged in a grid.

@chad-iris

Hhmm, interesting. It could make for cluttered looking IDs, but I see the use. Curious what others think, I'll ask around the DMC.

krischer commented 7 years ago

The main use case I see right now would be sensors in some form of grid, e.g. G_123_343 as a location or station code.

crotwell commented 7 years ago

If underscore was allowed, would it make sense to just use that for the loc-chan separator? In other words IU ANMO 00 BHZ would be encoded IU.ANMO.00_BHZ instead of IU.ANMO.00:BHZ.

Then the question is can we have multiple levels of location code? So maybe have something like XX.STA.G_123_343_BHZ where "G", "123", and "343" are location codes that apply to the BHZ channel? Each individual loc code would be limited to 4 chars, but there could be multiple for grids.

This goes back to my idea that loc codes are really exist solely as a name-space for channels, and this gives the option of a hierarchical namespace if needed.

The same could be used for station as well I suppose.

chad-earthscope commented 7 years ago

Then the question is can we have multiple levels of location code? So maybe have something like XX.STA.G_123_343_BHZ where "G", "123", and "343" are location codes that apply to the BHZ channel? Each individual loc code would be limited to 4 chars, but there could be multiple for grids.

I don't really understand what you mean above. I think a time series identifier should uniquely identify a single time series source. Is what you have above an identifier for multiple sources? How would applying multiple name spaces be useful?

crotwell commented 7 years ago

Multiple codes for a single source, meaning the first loc code is the NS line position, the second is the EW line position, etc. Nothing new here, just me butchering Lion's idea.

My point mainly is that it is a little weird to have 3 different separators ( . : and _ ) in what is a pretty short string. If we adopt Lion's underscores, then just use that for the existing : separator. Or use : instead of underscore. They are just delimiters, they don't need to be all different.

chad-earthscope commented 7 years ago

Maybe allow underscores and dashes in all the codes? Might be useful for example for somewhat semantic names of sensors arranged in a grid.

As I wrote in the 20170708 draft issue, I haven't heard any argument for both underscores and dashes, so I only added dashes. My preference is always dashes over underscores, which are just less obvious (in my opinion) and make for bad URLs reading/highlighting.

chad-earthscope commented 7 years ago

My point mainly is that it is a little weird to have 3 different separators ( . : and _ ) in what is a pretty short string. If we adopt Lion's underscores, then just use that for the existing : separator. Or use : instead of underscore. They are just delimiters, they don't need to be all different.

The "." is a separator and the ":" is a separator, but the "-" (or "_") are not separators, they are part of the code. At least that was my understanding of the proposal.

You could argue that the loc:chan separation should be done with a ".", the idea was that it's treated differently because it can be empty. There is a simplicity to knowing that, in the FDSN: namespace, there are always two dots, you can always parse the three required codes (net,sta,chan). Then the chan may be decorated with a location (how it's done in StationXML). Of course there could always be three dots, where the loc is sometimes empty, I just haven't see any other identifier systems that have back-to-back separators. But it's not impossible. Worst case is sometimes two dots, sometimes three dots, so I tried to avoid that.

Hopefully I haven't completely missed the point.

chad-earthscope commented 7 years ago

Allowed dashes for station and location codes in DRAFT 20170708.

chad-earthscope commented 6 years ago

My point mainly is that it is a little weird to have 3 different separators ( . : and _ ) in what is a pretty short string. If we adopt Lion's underscores, then just use that for the existing : separator. Or use : instead of underscore. They are just delimiters, they don't need to be all different.

The "." is a separator and the ":" is a separator, but the "-" (or "_") are not separators, they are part of the code. At least that was my understanding of the proposal.

You could argue that the loc:chan separation should be done with a ".", the idea was that it's treated differently because it can be empty. There is a simplicity to knowing that, in the FDSN: namespace, there are always two dots, you can always parse the three required codes (net,sta,chan). Then the chan may be decorated with a location (how it's done in StationXML). Of course there could always be three dots, where the loc is sometimes empty, I just haven't see any other identifier systems that have back-to-back separators. But it's not impossible. Worst case is sometimes two dots, sometimes three dots, so I tried to avoid that.

Hhmm, after writing some code to see how these identifiers could be handled, I find dealing with the sometimes-there and sometimes-not colon (:) separator more awkward than I was hoping. One practical issue is that matching the identifiers with globbing wildcards, which I would expect to be quite useful, is non-trivial in the case of matching all location codes. For example, "FDSN:net.sta.*:chan" does not match "FDSN:net.sta.chan". It'd certainly be simpler to always have 4 fields separated with the same delimiter.