@HOPEver1991 我现在在做电影方面的应用，里面有演员的信息，但是演员数据太稀疏了，一个演员一生也演不了几部电影，请问这种数据在机器学习上有什么比较好的处理方法吗？

memect / hao

好东西传送门

1.4k stars 461 forks source link

@HOPEver1991 我现在在做电影方面的应用，里面有演员的信息，但是演员数据太稀疏了，一个演员一生也演不了几部电影，请问这种数据在机器学习上有什么比较好的处理方法吗？ #160

Closed haoawesome closed 9 years ago

haoawesome commented 9 years ago

私信

haoawesome commented 9 years ago

好的，能不能假设你已经有数据集了（例如IMDB），那能不能讲讲想解决哪一类问题？这里举几个例子

为个人推荐电影， movie recommendation http://www.aaai.org/Papers/Workshops/2006/WS-06-10/WS06-10-005.pdf
预测电影评分： http://staff.science.uva.nl/~tsagias/wp-content/uploads/2012/01/ecir2012-imdb
分析电影圈里的人际关系：http://dx.plos.org/10.1371/journal.pone.0066443
电影分类： http://www.cse.ohio-state.edu/~kulis/pubs/multilevel-kdd.pdf
影评自动总结： http://users.cis.fiu.edu/~lli003/Sum/CIKM/2006/p43-zhuang.pdf

haoawesome commented 9 years ago

稀疏数据 keywords

Matrix Factorization
PCA, SVD

http://stackoverflow.com/questions/13497945/machine-learning-algorithm-for-completing-sparse-matrix-data

一些与稀疏数据／电影相关的文献

https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf Robust De-anonymization of Large Sparse Datasets Arvind Narayanan and Vitaly Shmatikov

http://arxiv.org/pdf/1301.2303.pdf Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments （UAI2001）

http://www.cs.cmu.edu/~ggordon/CMU-ML-08-109.pdf Relational Learning via Collective Matrix Factorization Ajit P. Singh Geoffrey J. Gordon

https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS

haoawesome commented 9 years ago

https://github.com/ksenish/svd-predict-movie-ratings Movie Recommendation Engine based on Singular Value Decomposition

至于降维，当然可以尝试PCA，也有人用SVD https://github.com/ksenish/svd-predict-movie-ratings 。

haoawesome commented 9 years ago

SVD和PCA都用来降维，这里有个不错的比较 http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca