Open kkappler opened 2 years ago
This happened again on April 23 at 0900 Pacific time.
The error is:
Traceback (most recent call last):
streams = dataset_config.get_data_via_fdsn_client(data_source="NCEDC")
File "/home/kkappler/software/irismt/aurora/aurora/sandbox/io_helpers/fdsn_dataset_config.py", line 78, in get_data_via_fdsn_client
self.endtime,
File "/home/kkappler/anaconda2/envs/py37/lib/python3.7/site-packages/obspy/clients/fdsn/client.py", line 830, in get_waveforms
raise ValueError(msg)
ValueError: The current client does not have a dataselect service.
I have attached the hz data from PKD for the time interval that we use for the tests ... ex, ey, hx, hy are already archived at IRIS. hz_pkd.csv
@timronan Can you or Laura look at adding this hz data to the IRIS archive? Then we can set up the tests to use IRIS (or try NCEDC and catch exception use IRIS).
in tests/parkfield/ calling python make_parkfield_mth5.py
creates the mth5 file locallly, with both the data and the metadata (from NCEDC).
This file could actually be used as a source of data and metadata that we could push to IRIS, see issue 99 in mth5: https://github.com/kujaku11/mth5/issues/99
Here's a new one, Sept 2, 2022: ` from obspy.clients.fdsn import Client
Client(base_url="NCEDC") `
Client(base_url="NCEDC") Traceback (most recent call last): File "/home/kkappler/software/pycharm-community-2019.1.1/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec exec(exp, global_vars, local_vars) File "", line 1, in
File "/home/kkappler/anaconda2/envs/py38/lib/python3.8/site-packages/obspy/clients/fdsn/client.py", line 276, in init self._discover_services() File "/home/kkappler/anaconda2/envs/py38/lib/python3.8/site-packages/obspy/clients/fdsn/client.py", line 1531, in _discover_services wadl_parser = WADLParser(wadl) File "/home/kkappler/anaconda2/envs/py38/lib/python3.8/site-packages/obspy/clients/fdsn/wadl_parser.py", line 28, in init doc = etree.parse(io.BytesIO(wadl_string)).getroot() File "src/lxml/etree.pyx", line 3536, in lxml.etree.parse File "src/lxml/parser.pxi", line 1893, in lxml.etree._parseDocument File "src/lxml/parser.pxi", line 1913, in lxml.etree._parseMemoryDocument File "src/lxml/parser.pxi", line 1800, in lxml.etree._parseDoc File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError File " ", line 1 lxml.etree.XMLSyntaxError: Space required after the Public Identifier, line 1, column 50
The lxml error is due to NCEDC changing their urls. See Issue 3134 https://github.com/obspy/obspy/issues/3134
Here is a new one again Dec 2022 Symptoms:
run_ts_obj.from_obspy_stream(streams_dict[station_id], run_metadata)
at the end of the method when calling self.validate_metadata()
message is:
mt_metadata.base.metadata.run.add_channel - ERROR: component cannot be empty
Note the mth5.timeseries.run_ts.RunTS
calls self.validate_metadata()
twice. The first time through it passes, but not the second.
The first time through is in the set_dataset
method of RunTS
. There is a check of the condition:
self.run_metadata.id not in self.station_metadata.runs.keys()
which is False, because self.run_metadata.id
= '0' and self.station_metadata.runs.keys()
= ['0',], so the
self.station_metadata.runs[0].update(self.run_metadata)
is ignored.
After set_data()
a check is made:
if run_metadata is not None:
self.run_metadata.update(run_metadata)
This metadata update is what triggers the failure, because after the metadata update:
self.run_metadata.id = '001'
and
self.station_metadata.runs.keys()
= ['0',]
i.e. the run_metadata.id changed, but the station_metadata.runs.keys did not. Because of this inconsistency, the next time self.validate_metadata()
executes, the condition
self.run_metadata.id not in self.station_metadata.runs.keys()
returns True, which triggers
self.station_metadata.runs[0].update(self.run_metadata)
I followed the trail for awhile, and the error occurs when an auxiliary channel is encountered... @kujaku11 do we want to force component on auxiliary channels? Also, we might need to track down why there is an aux channel at all here.
if channel_obj.component is None:
if not isinstance(channel_obj, Auxiliary): # Adding this condition seems to fix the 3.8/3.9 issue
msg = "component cannot be empty"
self.logger.error(msg)
raise ValueError(msg)
Regarding the second flavor of failure, ... this might be related to the obspy version.
Note that obspy v1.2.2 has python2 code in it.
A long-awaited python3-only version of obspy (v1.3) was released in 2022, and updated to 1.3.1 in October 2022. This requires python >= 3.7.
So we should probably require the same.
Only a month after v1.3.1 was released, out popped v1.4, November 2022. This version requires python>=3.8. It is not clear the value of maintaining v3.7 compatibility.
In any case, to fix the v3.7 issue, one need only replace the kwarg:
data_source="NCEDC"
with
data_source='https://service.ncedc.org/'
in make_parkfield_mth5
This argument is passed as base_url
to obspy Client
To reproduce the error:
from obspy.clients.fdsn import Client
client = Client(base_url="NCEDC", force_redirect=True)
but replacing with
from obspy.clients.fdsn import Client
client = Client(base_url="https://service.ncedc.org/", force_redirect=True)
works.
This is discussed in comment by alexhutko. It has to do with hardcoded url lookup tables, and the fact that NCEDC is only available via https not http. This may get fixed in obspy, but if we want to support py37 we can just use the explcit url (for now).
To fix the py38 issue, one only needs to be using obspy v1.4
Now that these tests are working again, there are a couple of things that can be done to simplify the parkfield tests:
pkd_sao_test_00.h5
ensure_data_exists()
can be placed in /test_utils/parkfield/make_parkfield_mth5.py
, and all the try/except stuff that is replicated in several methods can be placed in that one spotI pushed an h5 of the combined PKD and SAO data to mth5_test_data
.
in mth5_test_data/mth5/parkfield/pkd_sao_test_00.h5
It should be possible from this file to extract the metadata and the data-streams and archive these somewhere at IRIS.
When this is done, I suggest that the making of the PKD data, when using IRIS be done using make_mth5, instead of the NCEDC kluge we have implemented to work around their non-FDSN complient nomenclature.
Parkfield tests fail on gitactions runner because the data and metadata cannot be called when NCEDC is suffering an outage.
First observed on 17 Mar, 2022.
Since NCEDC is not a stakeholder at this point, we cannot expect them to be concerned about this issue.
We could: