rain1024 commented 10 years ago

Machine learning focuses on prediction, based on known properties learned from the training data.

In addition to retail, in ﬁnance banks analyze their past data to build models to use in credit applications, fraud detection, and the stock market.

In manufacturing, learning models are used for optimization, control, and troubleshooting.

In medicine, learning programs are used for medical diagnosis.

In telecommunications, call patterns are analyzed for network optimization and maximizing the quality of service.

In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast enough by computers.

The World Wide Web is huge; it is constantly growing, and searching for relevant information cannot be done manually.
Problems

1/3 Classification

Identifying to which set of categories a new observation belong to.

plot_multilabel_1

Examples

Spam detection, Image recognition, Hand written digit recognition

In

Out

2/3 Regression

Predicting a continuous value for a new example.

plot_isotonic_regression_1

Examples

Drug response, Stock prices, Predict Housing Price

In

Out

3/3 Clustering

Automatic grouping of similar objects into sets.

plot_dbscan_1

Examples

Customer segmentation, Grouping experiment outcomes, Document tagging

In

Out

Techniques

1/3 Dimensionality reduction

Reducing the number of random variables to consider.

Examples

Visualization, Increased efficiency

In

Out

2/3 Model selection

Comparing, validating and choosing parameters and models.

Examples

Improved accuracy via parameter tuning

In

Out

3/3 Preprocessing

Feature extraction and normalization.

Examples

Transforming input data such as text for use with machine learning algorithms.

In

Out

A Tour of Machine Learning Algorithms

ml_map Source: http://scikit-learn.org/stable/_static/ml_map.png

Lecture 1: Giới thiệu về Học máy
Lecture 2: Học dựa trên xác suất
Lecture 3: Hồi quy tuyến tính
Lecture 4: Láng giềng gần nhất
Lecture 5: Máy vector hỗ trợ
Lecture 6: Chuẩn hoá
Lecture 7: Đánh giá hiệu năng và lựa chọn mô hình
Lecture 8: Phân cụm bằng K-means
Lecture 9: Case studies
Lecture 10: Học cấu trúc và ngữ nghĩa ẩn
References

rain1024 commented 10 years ago

Probability and Estimation

rain1024 commented 10 years ago

Linear Regression

<regression>

Residual sum of squares (RSS)

Ridge regression

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape [n_samples, n_targets]).

Coefficient : $R^2 = 1 - \frac{u}{v}$ with $Residual sum of squares: v = \sum { (y_{true} - \overline{y_{true}})^2}$

Code

clf = Ridge(alpha=1.0)
clf.fit(X, y)

References

Onlinecourses.science.psu.edu, (2014). Lesson 5: Regression Shrinkage Methods | STAT 897D - Applied Data Mining and Statistical Learning. [online] Available at: https://onlinecourses.science.psu.edu/stat857/node/137 [Accessed 30 Jun. 2014].

rain1024 commented 10 years ago

Nearest neighbor

<classification>

rain1024 commented 10 years ago

Suport Vector Machine

<classification>

rain1024 commented 10 years ago

Normalization

rain1024 commented 10 years ago

Model and feature selection

rain1024 commented 10 years ago

Model and feature selection

Graphical Model

rain1024 commented 10 years ago

Structure Learning of Mixed Graphical Model

rain1024 commented 10 years ago

Data Sets

http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=table
GitHub, (2012). renatopp/arff-datasets. [online] Available at: https://github.com/renatopp/arff-datasets [Accessed 25 Jun. 2014].

rain1024 commented 10 years ago

Coding

scikit-learn

Runner-ups: weka, R, octave

Installation

install scipy
install numpy
pip install -U scikit-learn

rain1024 commented 10 years ago

Design and Analysis of Machine Learning Experiments

Intent

How can we assess the expected error of a learning algorithm on a problem? That is, for example, having used a classification algorithm to train a classifier on a dataset drawn from some application, can we say with enough confidence that later on when it is used in reallife,its expected error rate will be less than, for example, 2 percent?

Given two learning algorithms, how can we say one has less error than the other one, for a given application? The algorithms compared can be different, for example, parametric versus nonparametric, or they can use different hyperparameter settings. For example, given a multilayer perceptron with four hidden units and another one with eight hidden units, we would like to be able to say which one has less expected error. Or with the k-nearest neighbor classifier, we would like to find the best value of k

Criteria

risks when errors are generalized using loss functions, instead of 0/1 loss
training time and space complexity
testing time and space complexity
interpretability, namely, whether the method allows knowledge extraction which can be checked and validated by experts, and easy programmability

The relative importances of these factors change depending on the application.

For example, if the training is to be done once in the factory, then training time and space complexity are not important; if adaptability during use is required, then they do become important.

Most of the learning algorithms use 0/1 loss and take error as the single criterion to be minimized; recently, cost-sensitive learning variants of cost-sensitive learning these algorithms have also been proposed to take other cost criteria into account.

Model Validation

Wikipedia, (2014). Regression model validation. [online] Available at: http://en.wikipedia.org/wiki/Regression_model_validation [Accessed 3 Jul. 2014].

rain1024 commented 10 years ago

rain1024 / gVim-Pathogen

Machine Learning #22

Problems

1/3 Classification

Examples

In

Out

2/3 Regression

Examples

In

Out

3/3 Clustering

Examples

In

Out

Techniques

1/3 Dimensionality reduction

Examples

In

Out

2/3 Model selection

Examples

In

Out

3/3 Preprocessing

Examples

In

Out

A Tour of Machine Learning Algorithms

References

Probability and Estimation

Linear Regression

Residual sum of squares (RSS)

Ridge regression

Code

References

Nearest neighbor

Suport Vector Machine

Normalization

Model and feature selection

Model and feature selection

Structure Learning of Mixed Graphical Model

Data Sets

Coding

scikit-learn

Installation

Design and Analysis of Machine Learning Experiments

Intent

Criteria

Model Validation

Regularization