jddingemanse / awtiCodeDev

Development of code for and by Arba Minch University Water Technology Institute
1 stars 2 forks source link

Outlier tests (quality check for daily data) #31

Open BeyeneSe opened 1 year ago

BeyeneSe commented 1 year ago

Outlier testing The outliers are the extreme values within the dataset, which is the process of identifying extreme values in data, has many applications across a wide variety of water engineering. The most widely used approach to detect outliers are descriptive statistics and clustering. Descriptive statistics are a way to quantitatively describe a data set using summary statistics. This includes calculations such as such a mean, variance, maximum and minimum and includes graphical representations such as boxplots, histograms and scatter plots. Conversely, clustering techniques are a set of grouping data set together such that similar data set are in the same group.

jddingemanse commented 1 year ago

Thanks for starting this issue with this information on outliers. This is general information - to turn it into code, it needs to be made specific, and choices need to be made. Also, just saying 'boxplots are needed' does not yet help people on exactly how boxplots are made, and how boxplots can be used to find outliers. Do you think it is possible to develop code that satisfies the needs of AWTI staff regarding outliers?