section-engineering-education / engineering-education

“Section's Engineering Education (EngEd) Program is dedicated to offering a unique quality community experience for computer science university students."
Apache License 2.0
363 stars 890 forks source link

[Machine Learning] Anomaly Detection Model on Time Series data using Isolation Forest #6773

Closed collinskirui223 closed 2 years ago

collinskirui223 commented 2 years ago

Proposal Submission

Proposed title of article

[Machine Learning] Anomaly Detection Model on Time Series data using Isolation Forest

Proposed article introduction

A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Time series forecasting is the use of a model to predict future values based on previously observed values. Timeseries dataset may have anomalies or outliers that may lead to inconsistent results. Anomalies/outliers are observations or data points that deviate from a dataset’s normal behavior.

Anomalies in data are also called standard deviations, outliers, noise, novelties, and exceptions. Anomaly detection is the process of finding outliers in a given dataset. Outliers are the data objects that stand out amongst other objects in the dataset and do not conform to the normal behavior in a dataset. Anomaly detection identifies rare events, items, or observations that are suspicious because they differ significantly from standard behaviors or patterns.

Isolation forest is a machine learning algorithm for anomaly detection. It's an unsupervised learning algorithm that identifies anomalies by isolating outliers in the data. Isolation Forest is based on the Decision Tree algorithm.

Key takeaways

  1. Dataset preprocessing in time series dataset.
  2. Time series analysis and decomposition.
  3. Plotting trends, cyclic and seasonality graphs.
  4. Implementing Anamoly detection using Isolation Forest.
  5. Anomalies visualization using Plotly Express.

Article quality

This tutorial will explain time series analysis and decomposition in detail. We will discuss the decomposable components such as trends, cyclic, and seasonality components. The tutorial will also discuss the two types of anomalies: Contextual and global anomalies. We will deal with contextual anomalies in detail and implement an Isolation Forest algorithm to detect the anomalies. Finally, we will plot a visualization graph to show all the anomalies in the dataset. This will help to inconsistent results in model training and forecasting.

References

Please list links to any published content/research that you intend to use to support/guide this article.

Conclusion

Finally, remove the Pre-Submission advice section and all our blockquoted notes as you fill in the form before you submit. We look forwarding to reviewing your topic suggestion.

Templates to use as guides

github-actions[bot] commented 2 years ago

👋 @collinskirui223 Good afternoon and thank you for submitting your topic suggestion. Your topic form has been entered into our queue and should be reviewed (for approval) as soon as a content moderator is finished reviewing the ones in the queue before it.

WanjaMIKE commented 2 years ago

Seems like a helpful topic. Let's ensure that it adds value to the community. Thanks