Closed andyyuan78 closed 9 years ago
The data format seems different on the download site than what I originally used. Reading the following: http://files.grouplens.org/datasets/movielens/ml-100k-README.txt once you rebuild the compressed data files:
gunzip ml-data.tar.gz
tar xvf ml-data.tar
mku.sh
From their Readme, it looks like ratings.dat
in my version corresponds to u.data
, while movies.dat
corresponds to u.item
u.data -- The full u data set, 100000 ratings by 943 users on 1682 items.
Each user has rated at least 20 movies. Users and items are
numbered consecutively from 1. The data is randomly
ordered. This is a tab separated list of
user id | item id | rating | timestamp.
The time stamps are unix seconds since 1/1/1970 UTC
u.item -- Information about the items (movies); this is a tab separated
list of
movie id | movie title | release date | video release date |
IMDb URL | unknown | Action | Adventure | Animation |
Children's | Comedy | Crime | Documentary | Drama | Fantasy |
Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
Thriller | War | Western |
The last 19 fields are the genres, a 1 indicates the movie
is of that genre, a 0 indicates it is not; movies can be in
several genres at once.
The movie ids are the ones used in the u.data data set.
Let me know if that addresses the issue. If so, I will update the files and README appropriately and close the issue.
Here was the original README from the data format that this code was originally written for: http://files.grouplens.org/datasets/movielens/ml-10m-README.html and the original zip files located: http://files.grouplens.org/datasets/movielens/ for various sizes. I'll update the README to link to the correct data source. Downloading: http://files.grouplens.org/datasets/movielens/ml-10m.zip appeared to work on my machine, though I image the other sized datasets would work too (haven't tested this)
Fixed in a60e56d323bd3493b41998d7eb99cc9285f1f68f
it works now.
I had tried severals like ml-20m.zip ,ml-1m.zip and ml-latest.zip, no one works
I am sure put them in right place and change the code from .csv to .dat.
but doesn't work yet.