Open kkappler opened 6 months ago
The issue is in mt_metadata and I started a branch there.
Once the instrument type is being correctly identified, the next problem is that the time metadata is not correct.
This may be a problem at earthscope.
At least for one of the cases I encounter: around line 109 in xml_channel_to_mt_channel, as keys are looped
xml_key
"end_date", which maps to mt_key
"time_period.end" the value is None (the station is presumably continuously monitoring), but since the value is None
, we do not update the channel_metadata.
xml_key
"start_date", which maps to mt_key
"time_period.start" the value is 2023-05-13T00:00:00.000000Z.
The result is that the data end_time remains the default 1980-01-01 and the start time is in 2023.
Later, around line 118 of xml_inventoty_to_mt_experiment.py, this conflicted channel metadata is passed to the mt_run
.
Later this creates an exception in mth5/groups/run.py in the add_channel
method.
Note that the data could be passed to add channel and we maybe could avoid the error, but the metadata would still be wrong
The problem happens around line 404 when the data size are estimated:
estimate_size = (
int(
(
channel_metadata.time_period._end_dt
- channel_metadata.time_period._start_dt
)
* channel_metadata.sample_rate
),
The result of this is a negative value for estimate_size
, which is not a valid shape for the dataset that gets created.
So the root problem is probably that the end time is 1980-01-01, but it is strange that we have never encountered this before.
The solution maybe to assign a default time_period end to a date far in the future rather than far in the past.
After modifying mt_metadata xml_channel_mt_channel.py, the non uniform descriptions of the Bartington is OK, at least for network=IU, station=SSPA network=N4, station=E62A
IU-SSPA also builds and calibrates (although the DC-levels are removed).
N4-E62A does not build -- the error is in the stream parsing. IN this example I requested data from April 8, 2024, for all three components and get these streams:
N4.E62A.40.LF1 | 2024-04-08T00:00:00.000000Z - 2024-04-08T10:59:17.000000Z | 1.0 Hz, 39558 samples
N4.E62A.40.LF1 | 2024-04-08T10:59:23.000000Z - 2024-04-08T18:51:59.000000Z | 1.0 Hz, 28357 samples
N4.E62A.40.LF1 | 2024-04-08T18:59:43.000000Z - 2024-04-09T00:00:00.000000Z | 1.0 Hz, 18018 samples
N4.E62A.40.LF2 | 2024-04-08T00:00:00.000000Z - 2024-04-08T10:47:48.000000Z | 1.0 Hz, 38869 samples
N4.E62A.40.LF2 | 2024-04-08T10:59:23.000000Z - 2024-04-08T18:51:26.000000Z | 1.0 Hz, 28324 samples
N4.E62A.40.LF2 | 2024-04-08T18:59:43.000000Z - 2024-04-09T00:00:00.000000Z | 1.0 Hz, 18018 samples
N4.E62A.40.LFZ | 2024-04-08T00:00:00.000001Z - 2024-04-08T10:55:56.000001Z | 1.0 Hz, 39357 samples
N4.E62A.40.LFZ | 2024-04-08T10:59:23.000000Z - 2024-04-08T18:53:33.000000Z | 1.0 Hz, 28451 samples
N4.E62A.40.LFZ | 2024-04-08T18:59:43.000000Z - 2024-04-09T00:00:00.000000Z | 1.0 Hz, 18018 samples
These start and end times show that there are two ~8s gaps in LF1 near 1100h, which may be a problem, but a more serious issue is that the start time for LF3 is a microsecond later than it is for LF1, LF2.
A general fix for this seems like it may be tricky.
Obviously, the solution in this case is to round the 2024-04-08T00:00:00.000001Z start time to 2024-04-08T00:00:00.000000Z, but how to do this programatically?
I suppose we could try to bucket the times that are within a few microseconds of each other ... but that feels hacky. We could , for example,
start_stamps = np.array([pd.Timestamp(x) for x in start_times])
np.diff(start_stamps)
Out[13]:
array([Timedelta('0 days 00:00:00.000001'),
Timedelta('0 days 10:59:22.999999'), Timedelta('0 days 08:00:20')],
dtype=object)
This will show that there are two time stamps very close together (not expected), and I suppose if the difference between teh time stamps is less than a sample interval, we can drop one of them, but which one to drop? The one that is a "rounder number" also feels pretty hacky ... but that might be the way to do it for now.
SO, to do that, we would iterate over adjacent pairs of start and end times, and drop one if they were very close together.
BUT things can get worse still ... Note that the gaps do not line up, as if the digitizer was not synchronously sampling the data. We have never encountered this case before.
LF1's first stream is 10h59m17s in duration, but LF2's is 10h47m58s, and LF3 10h55m56s.
To handle this we may need to introduce nan, or drop the non-synchronized samples entirely, neither of which is a nice option. Also, in either case, we would probably need to modify the stream objects, either by packing nans into them, or by dropping the samples that are not common to all channels.
For the record here is the link to the station XML
TODO: These incoherent streams should be investigated to determine if the inconsistencies in the timestamps are coming directly from the Earthscope source, or if this is happening on obspy.
Network: IU Stations: SSPA, WCI, HRV.
There may be multiple issues here. The first problem is that the instrument "type" in the EarthScope metadata is not being recognized by mt_metadata. This is compunded somewhat by inhomogeneous values for sensor type:.
Around line 96 of xml_channel_to_mt_channel.py
mt_channel = self._sensor_to_mt(xml_channel.sensor, mt_channel)
xml_channel
is an object of type:obspy.core.inventory.channel.Channel
The
_sensor_to_mt
method is expecting to find something valid inxml_channel.sensor.type.
We could probably work around this with a
_triage_sensor_type()
function that looks at the other information. In the SSPA case,xml_channel.sensor.description
has value:'Bartington 3-Axis Fluxgate Sensor'
, so we should be able to deduce this -- however, for WCI, HRV, the sensor.type is "Bartington".Basically, when sensor type is None or is not recognized, we should assign it if we can. This has the possibility of getting pretty messy, since we have no control on how Earthscope (or another client) chooses to assign these labels, but we could add, on a case by case basis handlers for this.
Maybe
TODO: add example request dataframe and code example