Closed tclements closed 4 years ago
I can change this very quickly for a single-channel request. Is that what you're looking for?
Yes, would be good to have consistent file naming, with network, station, location, and channel included.
It's not an inconsistency on my part. It's a fundamental difference between IRIS timeseries and FDSN datselect.
IRIS' timeseries service allows one channel per request and no wildcards. That's why I'm able to stream to file with an automatic channel name: the channel is known, and unique, when the request begins.
Not so for FDSN.
FDSN dataselect allows multiple channels and wildcard requests. If a request contains multiple channel strings, it's impossible to define an accurate naming convention without processing the download.
I've thought about changing this before, but FDSN filenames will only be accurate with that naming style when one requests a single channel at a time with no wildcards.
That defeats the purpose of FDSN.
A single-channel FDSN request adds overhead associated with opening and closing multiple HTTP requests. I'm not really sure why your group uses them, honestly. Unless requests are very long (e.g, Kurama's example of a 1-year request), the overhead is significant. HTTP requests are already slow relative to other data transfer protocols, and both FDSN and SeisIO are built around the concept of accessing multiple channels at once...
Moreover, if I automatically name files with FDSN, my choices are either to introduce a true internal inconsistency, to create file names so turgid that humans can't read them, or to create file names that contain no identifying information.
I can demonstrate this using your example:
"2019.1.00.00.00.__.____..___.mseed"
. No one can tell what's in it.So, I'm willing to modify how file naming works in FDSNget, but the options aren't great:
Thank you, I did not know about the FDSN overhead for a single channel. Could FDSN throw a warning if a user tries to get data for a single channel? For users, I think it should be explicit to use IRIS for single channel requests and FDSN for multi-station requests.
I'll add that to the documentation as an explicit recommendation, but that choice only exists for data archived at IRIS. IRIS staff tell me that many channels listed in their metadata aren't actually there -- the issue your group was having with SCSN stations was caused by that.
Oh, the overhead isn't from FDSN itself; I mean that there's overhead associated with requesting N
channels one at a time when you only need one request. For a short request, concatenating requests with chans = join(all_channels, ",")
then passing chans
to get_data
will be faster than N
total get_data
calls to N
strings in all_channels
.
Paralellization changes that, though. Requests like Kurama's benchmark are almost certainly faster with N parallel get_data
calls because the overhead from making N calls is more than compensated for by the speed improvement of doing them in parallel. So that's not why he gets a low exponent in Fig. 4. When I incorporate his code into SeisIO core, I'll fork each string in the data request to one request on one CPU within get_data
itself; reducing the number of get_data
calls to one will eliminate much of the overhead and increase his exponent.
The solution I recommended above is now live. Pass keyword autoname=true
to generate IRIS-style file names in single-channel FDSN requests.
When get_data(.., w=true) is in write mode, if the source is "FDSN" the output file name is YYYY.JJJ.HH.MM.SS.000000.FDSNWS.IRIS.mseed whereas if the source is "IRIS" the output file name YYY.JJJ.HH.MM.SS.(NET).STA.LOC.CHAN.R.mseed. e.g.
S = get_data("FDSN", "UW.LON..BHZ", src="IRIS",s="2019-01-01",t=600,w=true)
writes a file named2019.1.00.00.00.000000.FDSNWS.IRIS.mseed
and
S = get_data("IRIS", "UW.LON..BHZ",s="2019-01-01",t=600,w=true)
writes a file named2019.1.00.00.00.UW.LON..BHZ.R.mseed
.This should be a simple fix to
FDSNget!
.