intelligent-environments-lab / utx000

Analysis related to the many studies under the UTx000 banner, a project under the Whole Communities, Whole Health initiative.
MIT License
0 stars 1 forks source link

Beacon Data - Removing Extreme Values #22

Closed HagenFritz closed 3 years ago

HagenFritz commented 3 years ago

Some of the values from the beacons are quite erroneous, reading well outside of what is expected. A quick glance shows that these events tend to be short-lived indicating that the sensor isn't consistently reading high, but rather had some issue (power, fouling, etc.) that caused a high reading.

The Problem

There are currently two things we need to look into:

Where to look

I have started to explore the distributions of data points from each sensor in the beacon exploration notebook. The histograms provide some small insight into the problem.

The Current Solution

The data are pre-processed in the [make_dataset}(https://github.com/intelligent-environments-lab/utx000/blob/master/src/data/make_dataset.py) file. The current processing is done by checking the z-score of the individual values and removing values whose absolute value is greater than 2.5.

This process works for the most part, but still there are some values that are retained that shouldn't be. These values become apparent when trying to calculate metrics like the percent change during a certain timeframe.

The Solution

We need some other way to smooth out the data. The five-minute averaged values are kept for the beacon, but perhaps we can apply some sort of filter to the data. I am thinking:

  1. rolling average
  2. rolling median
  3. weighted average based on the standard deviation from the dataset (certainly there is something like this?)
HagenFritz commented 3 years ago

Duplicate issue - removing and updating the other issue