wangzhenhui1991 / Notes

3 stars 0 forks source link

论文part4:基于hadoop的聚类分析 #7

Open wangzhenhui1991 opened 7 years ago

wangzhenhui1991 commented 7 years ago

Large scale cluster analysis with Hadoop and Mahout.pdf

DataSet

Finally, the data to be processed consists of two data sets. One compiled from the music database Last.FM. The data set was created in 2007 and consists of approximately 20 000 unique artists tagged with 100 000 unique tags (the total tag count is roughly 7.1 million). [8] The second data set is from Tumblr and consists of a snapshot of the activity across the site for a continuous period of time. The data set is large in both dimensionality, approximately 40 million unique tags, and cardinality, approxima

wangzhenhui1991 commented 7 years ago

关于Hadoop与Matlab 的性能分析

关于这点,matlab的矩阵计算使用的是Intel自己出的Math Kernel Library(MKL),是在汇编级别上优化,C快在循环,这个库远比其他blas/lapack库要快。

wangzhenhui1991 commented 7 years ago

对mahout实现K-Means算法详细的分析 Mahout学习——K-Means Clustering

wangzhenhui1991 commented 7 years ago

fcm on MapReduce: http://ai2-s2-pdfs.s3.amazonaws.com/936d/1fc30c82db64ea06a80a2c17b635299b7a48.pdf