markfuge / scalable-collaborative-filtering

Collaborative Filtering
5 stars 8 forks source link

which data file works?(http://grouplens.org/datasets/movielens/) #2

Closed andyyuan78 closed 9 years ago

andyyuan78 commented 9 years ago

I had tried severals like ml-20m.zip ,ml-1m.zip and ml-latest.zip, no one works

I am sure put them in right place and change the code from .csv to .dat.

but doesn't work yet.

markfuge commented 9 years ago

The data format seems different on the download site than what I originally used. Reading the following: http://files.grouplens.org/datasets/movielens/ml-100k-README.txt once you rebuild the compressed data files:

gunzip ml-data.tar.gz
tar xvf ml-data.tar
mku.sh

From their Readme, it looks like ratings.dat in my version corresponds to u.data, while movies.dat corresponds to u.item

u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
              Each user has rated at least 20 movies.  Users and items are
              numbered consecutively from 1.  The data is randomly
              ordered. This is a tab separated list of 
             user id | item id | rating | timestamp. 
              The time stamps are unix seconds since 1/1/1970 UTC   
u.item     -- Information about the items (movies); this is a tab separated
              list of
              movie id | movie title | release date | video release date |
              IMDb URL | unknown | Action | Adventure | Animation |
              Children's | Comedy | Crime | Documentary | Drama | Fantasy |
              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
              Thriller | War | Western |
              The last 19 fields are the genres, a 1 indicates the movie
              is of that genre, a 0 indicates it is not; movies can be in
              several genres at once.
              The movie ids are the ones used in the u.data data set.

Let me know if that addresses the issue. If so, I will update the files and README appropriately and close the issue.

markfuge commented 9 years ago

Here was the original README from the data format that this code was originally written for: http://files.grouplens.org/datasets/movielens/ml-10m-README.html and the original zip files located: http://files.grouplens.org/datasets/movielens/ for various sizes. I'll update the README to link to the correct data source. Downloading: http://files.grouplens.org/datasets/movielens/ml-10m.zip appeared to work on my machine, though I image the other sized datasets would work too (haven't tested this)

markfuge commented 9 years ago

Fixed in a60e56d323bd3493b41998d7eb99cc9285f1f68f

andyyuan78 commented 9 years ago

it works now.