Finally, the data to be processed consists of two data sets. One compiled from
the music database Last.FM. The data set was created in 2007 and consists of
approximately 20 000 unique artists tagged with 100 000 unique tags (the total
tag count is roughly 7.1 million). [8]
The second data set is from Tumblr and consists of a snapshot of the activity
across the site for a continuous period of time. The data set is large in both dimensionality,
approximately 40 million unique tags, and cardinality, approxima
Large scale cluster analysis with Hadoop and Mahout.pdf
DataSet
Finally, the data to be processed consists of two data sets. One compiled from the music database Last.FM. The data set was created in 2007 and consists of approximately 20 000 unique artists tagged with 100 000 unique tags (the total tag count is roughly 7.1 million). [8] The second data set is from Tumblr and consists of a snapshot of the activity across the site for a continuous period of time. The data set is large in both dimensionality, approximately 40 million unique tags, and cardinality, approxima