Open pavlis opened 6 months ago
oh... that sounds pretty bad. We definitely want to fix it asap.
I just got the docs added. The python api doc is not automatic. It needs to be added to the index, which I think was overlooked when resample was added.
The patch yesterday resolves the timing error bug but it has a minor blemish that will impact performance. I ran a minor variant of the workflow that initiated this issue. The timing problem went away but it seems every TimeSeries passed through the processing ends up with an entry like the following the elog collections:
{
"_id": {
"$oid": "671a2b3cfba663610cde28fe"
},
"logdata": [
{
"job_id": 0,
"algorithm": "resample",
"badness": "ErrorSeverity.Complaint",
"error_message": "sampling_rate inconsistent with 1/dt; updating to 1/dt",
"process_id": 144912
}
],
"data_tag": "serial_preprocessed",
"wf_TimeSeries_id": {
"$oid": "671a2a551910a104e6c014b7"
}
}
This is being created by this section of resample:
mspass_object.dt = self.dt
# Check for "sampling_rate" attribute
if mspass_object.is_defined("sampling_rate"):
sampling_rate = mspass_object["sampling_rate"]
# Check if sampling_rate is consistent with 1/dt
if abs(sampling_rate - 1.0 / mspass_object.dt) > 1e-6:
# Record inconsistency in error log (elog)
message = "sampling_rate inconsistent with 1/dt; updating to 1/dt"
mspass_object.elog.log_error("resample",
message,
ErrorSeverity.Complaint)
# Update sampling_rate to 1/dt
mspass_object["sampling_rate"] = 1.0 / mspass_object.dt
else:
# Set sampling_rate to 1/dt
mspass_object["sampling_rate"] = 1.0 / mspass_object.dt
There are two problems with this section is that it doing something it shouldn't do in testing the current value of "sampling_rate". In this function the "sampling_rate" attribute should just be set to self.samprate
as that is what the operator is doing. There is a similar construct in TimeSeries2Trace
that is appropriate but this one is not. This operator is changing the sampling rate (aka sampling interval dt) so that value will always be definitive.
I think this construct appears multiple times in the resample module. As I said it is harmless as it doesn't do anything bad but it will impact performance and lead to database bloat when every datum gets an elog entry.
ahh... sorry, I overlooked this. Yes, we need to remove the elog from the resample. @Cloud1e Could you please get that fixed? Thank you!
ahh... sorry, I overlooked this. Yes, we need to remove the elog from the resample. @Cloud1e Could you please get that fixed? Thank you!
OK. I fixed this problem in resample.py. But in TimeSeries2Trace, it seems that the variable ts doesn't have attribute "samprate". Can I use 1/dt to update the "sampling_rate" of ts and dresult?
For TimeSeries2Trace, We should add a elog there if sampling_rate doesn't match 1/dt and update it. That was the original bug. Gary is only saying for resample, we don't need to check for the original sampling_rate as we should always update it to whatever rate it is we are resampling to.
After many hours over the last 24 I have established that there is a bug in the
resample
function. Note also there is something wrong with our configuration as none of the docstrings found in algorithms/resample.py are in the sphnyx docomentation. That is just an annoyance, but the bug is serious. It definitely messes up timing. In the draft Session1 notebook for the upcoming short course I had a line to downsample the data from 40 to 20 sps. I was mystified why I wan't getting QC plots windowed around P that looked right. The reason was the window was always at the wrong time because the resample operator did something I don't quite understand. We are going to have to do some controlled test to figure this out. The resample operator wasn't essential for this class anyway so it is easy to remove it and not cause an issue. It may increase some run times since we will be handling twice as many samples but it should not be a problem.