Filtering - Githubissues

joe045 commented 1 year ago

Hei again @jerabaul29,

There are some outliers of Hs for the radar and ug1 (about 20m Hs) that I would like to remove. The easiest is to remove them directly from the Hs time series, but I guess it is more "correct" to remove them from the original elevation data?

I have had a look at the maximum of each probe, and they are all saturating between 15.4 - 15.8m, but most often the radar (slide 1). I have tried to remove all values above 15.2m (maximum range for ultrasonic probes) and using the 1.5 IQR rule (slide 2,3 and 4). Both methods actually give worse results (slide 5). I think it is because the data is also filtered when setting fmin = 0.05 and fmax = 0.5. So when I remove some outliers, some values that earlier were excluded by the frequency range gets included.

Do you have a better suggestion for how to remove the outliers?

Some smoothing is performed when calculating the power density spectrum with a savgol filter (slide 6) (signal.savgol_filter(Pxx_den, window_length=9, polyorder=2), but it does not impact the Hs if I remove it.

outliers.pdf

jerabaul29 commented 1 year ago

Hi @joe045 :)

Sorry for the small delay, a bit hectic here as usual :) .

A few thoughts from your figures:

very nice that you looked into the "saturated max distance" data, and I think that looking into the "max value * 0.95" or something like this (ie setting a limit that is a bit lower than the absolute true max) as you did is good (as the values are the result of an ADC conversion, so there is always noise, so there is no guarantee that the "saturated max output" always translates into the "absolute max reading", so putting a bit of a safety factor is good :) ).
it may also be important to look at the "saturated low distance", i.e. do the same analysis (min value, and 1.05*min value) for the short distance saturation - as I am not completely sure of "how" the sensors saturate if they completely loose echo or get completely get confused, I would also expect that they send an "out of range, max value" output, but they may also send a "short range" output - though it seems unlikely from the boxplot, it would be nice to double check :) . It looks like this is taken into account in the last slides too, but having a "start slide" about it too could be good :)
Could you also plot a histogram of the distribution of the readings, for each probe? You could look at a day or two. That would be useful to see how much saturation there is in an "easy to see" way :)
I think that removing outliers from the swh plots is fine too - there are quite a few moving parts in the processing, so it is hard to get 100% of the outliers from the raw data, and it is fine to remove swh outliers as long as you state it and how you remove them in my opinion (you can remove them for example based on a sliding nsigma filter, somethings for example inspired from https://github.com/jerabaul29/OpenMetBuoy-v2021a/blob/f6b6149adb14609fc1c5cc5e9ae2c430672df778/legacy_firmware/utils/utils.py#L34-L58 : the nice thing is that this is a well defined, "scientific", easy to explain method :)
I agree that there is quite a lot of filtering happening by getting the swh from m0 and limiting the frequency range used for integration; as you say, this takes away a lot of the "occasional outliers", as these contribute only to the high frequency part of the spectrum. This is interesting to document, and is a good argument for using the swh calculation based on m0 integrated over a limited frequency range :)
I am not completely sure of the effect of smoothing the spectrum on the computed moments and associated swh, tp etc quantities. I would spontaneously advice again (or at least advice for caution) when i) smoothing spectra, before ii) computing moments, as the smoothing may not be energy conserving / follow the Parceval theorm, depending on how it is done. But the effect may be minor. I think that, in all cases, the safest is to i) remove "super obvious outliers" and instead interpolate them, ii) compute the Welch spectrum, iii) compute the moments mi of the Welch spectrum using a limited frequency range, iv) derive the scalar wave properties from the moments.

A small extra note: I am a bit surprised by how different UG1 is compared with UG2 / radar. Is this doing the (correct) motion compensation? If so, this is really a demonstration that motion compensating a sensor at the very from of a sailship bowsprit is really hard - which a posteriori may not be so surprising :) .

joe045 commented 1 year ago

Thanks for taking the time to give good answers Jean!

I have looked at the daily min, max and mean values for the three probes (see slide 1). In the user manual it is stated that the accuracy of the Ultrasonic and Radar probe is ±0.25% of measured range. May the reason for larger outliers for ug1 and radar be that the mean range is generally larger than it is for ug2? And why is the mean range for the radar larger than ug2 when they are mounted at the same locations? Is it also related to the ADC conversion?

The histogram is a good way visualize it! I guess it is better to show 30minutes than a full day so that it is easier to spot the outliers (slide 2 and 3)?

These figures are raw elevation data, not compensated for the motion of the ship. I added some figures comparing significant wave height for ug1, ug2, radar and solely IMU data if you are more curious :) In these figures no outliers are removed, except for the frequency limitation.

Outliers_new.pdf

jerabaul29 commented 1 year ago

Happy this discussion helps :) .

Regarding slide 1:

the UG probes only slightly exceed the datasheet value; I think this comes from how we measure them - there is some analog electronics involved (voltage divider), some digital electronics (ADC), and all of this has noise here and there. So I think that the UG look good
Regarding the radar: referring to https://github.com/jerabaul29/ultrasound_radar_bow_wave_sensor_Statsraad_Lehmkuhl_2021_2022/blob/main/doc/sensors/AGP/radar_AGP/PRL.pdf : I think we have a PRL-050, so that should be 50m range, but maybe we actually got a PRL-100 with 30m range - that would explain for the 30 m point... I think that when the radar gets confused, it can either report the "absolute max" value (30m?), or some distance that is not correct (any of these 20ish m values). The ship is built in metal, so there may be plenty of non trivial reflections of radar waves on the water, the hull, etc, and maybe that sometimes the radar picks up some of these weird paths / echos rather than the surface of the water.
Regarding UG1 vs UG2, there may be some small differences between the sensors / electronics (ie the resistors used for the voltage divider are only +- 1% at best), that may explain? Also, they are at different height and location, with different possible challenges in getting a good signal...

Agree, the histograms look really good, it is very well visible what is outliers :) .

These figures are raw elevation data, not compensated for the motion of the ship. I added some figures comparing significant wave height for ug1, ug2, radar and solely IMU data if you are more curious :) In these figures no outliers are removed, except for the frequency limitation.

Ok, interesting. If you can also add figures for UG compensated with IMU it would be great :) . In theory / if we manage to build and post process the system / data well, that should be giving the best results :) . The "both nice and tricky" thing is that with these waves motions, all the measurements (IMU, UG) are always a "not so complicated to related to local swell using a well tuned simple transfer function", so even if the "IMU+UG" should be best, it is well possible that in practice, with a bit of tuning, "IMU" or "UG" is good enough / even better.

joe045 commented 1 year ago

Fabian's thesis states that the PRL-050 is used, and in the manual PRL-050 refers to 50ft = 15m. But I should have mentioned that the figure above is for the new inverted radar fluctuation (to match ug2 fluctuation). The mean is calculated over each 30-minute file:

    radar_wrong_fluctuation = old_radar - radar_mean
    radar_right_fluctuation = -radar_wrong_fluctuation
    new_radar = radar_right_fluctuation + radar_mean

The old radar fluctuation does not exceed 15,57m and has minimum values around zero, which is probably more similar to what you expected? Slide 1 (orange is old radar and green is new radar), 2 (new radar) and 3 (old radar) shows that the old outliers around zero, are shifted to 20m (twice the mean). But the old radar fluctuation actually has several values exceeding 15.2m than the new radar. The old radar also has some negative values down to -1m that are unrealistic.

I will therefore continue with the new radar with an elevation limit. Excluding ug1 values above 15.2m actually gives worse results (slide 4), so I will only exclude values above 15.8m, so that only the radar is affected.

The figures above named UG is the combination of UG+IMU (should have been clearer about that). But I had a look at solely UG data (slide 5), and it looks like the UG+IMU gives the best results (I will do some statistics on it). For the ug1, the difference is quite large, so the IMU compensation is needed as expected!

Otliers_newest.pdf

jerabaul29 commented 1 year ago

Thanks for the update about the radar "conventions", that makes sense, I think it is simpler for the user to show things in the "natural" way, ie actual radar range - it is a bit confusing otherwise to read about radar readings that are appearently "out of range" due to the offset even though they are not truly out of range ^^ :) .

Ok, sounds good - agree it is very expected that UG + IMU should be the best, this is what we hope for / how things should be if the processing is done correctly :) . I think that being very detailed / explicit / not having any implicit convention makes it easier to understand, so yes it may be a good idea to write UG / radar + IMU :) .

Sounds excellent, then I think all makes sense and the results look really good actually :) . Let me know if you want to have a chat some day in case there is more to discuss (but seems that everything is clear now right? :) ).

joe045 commented 1 year ago

Forgot to answer, but yes, everything is clear now! Thanks again :)

joe045 / wave_sensors_one_ocean_expedition_2023

Filtering #3