raglew / OutlierDetection

Detecting & managing outliers in a dataset
1 stars 1 forks source link

Outlier Detection

Detecting and managing outliers in a dataset.

Plotly is used to plot a graph of the data. See python3 file 'OfflinePlotlyTrace.py' -- You will need to pip install plotly.

Graph below of data

alt tag

Figure below indicating possible outliers

alt tag

Python3 file DataStats.py can be used to print out stats. Lowest value = 1.7 Highest value = 11.95 Average = 2.82 Median = 2.15 Q1 = 1.92 Q3 = 3.04 IQR = 1.12 Multiplier = 2.2 Min value = 0 Max value = 5.5 NB. Values less than min value and greater than max value are traditionally defined as outliers.

Python3 file OutlierDetection.py can be used to plot stats. See below.

Figure below indicating outliers

alt tag

Expected different results. Are the values indicated as outliers really outliers? Method to detect outliers needs to be examined further. Perhaps another approach is needed.

Two approaches spring to mind.

  1. Use percentiles instead of quartiles.
  2. Range across data series captured from several data collections.