memect / hao

好东西传送门
1.4k stars 461 forks source link

@HOPEver1991 我现在在做电影方面的应用,里面有演员的信息,但是演员数据太稀疏了,一个演员一生也演不了几部电影,请问这种数据在机器学习上有什么比较好的处理方法吗? #160

Closed haoawesome closed 9 years ago

haoawesome commented 9 years ago

私信

haoawesome commented 9 years ago

好的,能不能假设你已经有数据集了(例如IMDB),那能不能讲讲想解决哪一类问题? 这里举几个例子

haoawesome commented 9 years ago

稀疏数据 keywords

http://stackoverflow.com/questions/13497945/machine-learning-algorithm-for-completing-sparse-matrix-data

一些与稀疏数据/电影相关的文献

https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf Robust De-anonymization of Large Sparse Datasets Arvind Narayanan and Vitaly Shmatikov

http://arxiv.org/pdf/1301.2303.pdf Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments (UAI2001)

http://www.cs.cmu.edu/~ggordon/CMU-ML-08-109.pdf Relational Learning via Collective Matrix Factorization Ajit P. Singh Geoffrey J. Gordon

https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS

haoawesome commented 9 years ago

https://github.com/ksenish/svd-predict-movie-ratings Movie Recommendation Engine based on Singular Value Decomposition

至于降维,当然可以尝试PCA, 也有人用SVD https://github.com/ksenish/svd-predict-movie-ratings

haoawesome commented 9 years ago

SVD和PCA都用来降维, 这里有个不错的比较 http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca