infiBoy / BotAnalytics

none
0 stars 0 forks source link

Phase 1 - Graphs #1

Open infiBoy opened 7 years ago

infiBoy commented 7 years ago

Reading tasks: Major source- http://dmml.asu.edu/smm/SMM.pdf

Specific parts:

  1. If unfamiliar with Graph or Data Science basic's - First learn essentials (chapter 2 +chapter 5) , otherwise read as needed.

  2. Network (I,3) 2.Community analysis (II,6)

  3. Influence measuring ( III , 8)

  4. Behavioral analytics (III,10) 5.Information diffusion (II,7)

Make a summary for each module ,specific emphasize the algorithms that seems to be the most practicals.

Hands on practice (when finished each task, make a pull request):

ghost commented 7 years ago

Specifics parts :

Among the good practices that have emerged over time to improve the quality of the projects is the most used methods are the SEMMA and the CRISP-DM which is the one most used in the years 2010.

For the algorithms, there are two methods for doing data analysis, the predictive method and the descriptive method, which are even overlapping in the sub-method:

Descriptive methods: They make it possible to simplify, organize and help understand the information of a large set of data.

Techniques derived from statistics can be exploited. The factorial analyzes such as the analysis of principal components, independent components, multidimensional positioning or analysis of multiple correspondences are the most common.

We can also employ the nee methods in the wake of artificial intelligence as automatic learning.

Predictive methods: Their primary purpose is to predict or explain one or more observable and effectively measured phenomena.

They look for one or more variables as a target to be defined as the targets of the analysis. In a predictive data exploration, there are two types of operations, ranking, which is concerned with qualitative variables and regression or prediction, which is concerned with continuous variables. They make it possible to separate individuals into several classes, supervised or unsupervised. A quality model is a fast model with the lowest error rate. Several indicators are used to evaluate the quality of a model, among which the ROC and Lift curves, the Gini index and the mean square error show the prediction in relation to reality.

Computer tools: In 2009, SPSS, RapidMiner, SAS, Excel, R, KXEN, Weka, Matlab, KNIME, Microsoft SQL Server, Oracle DM, STATISTICA and CORICO are the most widely used tools. In 2010, R is the most widely used tool. Today, the computer tool is used in cloud, type oracle data mining on IaaS of amazon.