liu431 / Big-Data-Project

CAPP 30123 Class Project
2 stars 5 forks source link

Visualization to make #41

Open sanittawan opened 5 years ago

sanittawan commented 5 years ago

Data description

  1. Relational Diagram of the files (@dhruvalb)
  2. MPI run time experiment

Exploratory Analysis

  1. Top 15 tags (@tonofshell) - this file OR this file (I'm confused. Are they the same?)
  2. Users Activities (@tonofshell) - which users are most active - this file
  3. Questions with most answers per year (@tonofshell) - this file
  4. Users with gold answer badges locations (@tonofshell) - this file
  5. 2-grams of tags that appear together (network of tags) (@tonofshell) - this file

Main Analysis

  1. Time series plots of each language
    • Please ask @liu431 for the output
sanittawan commented 5 years ago

@tonofshell I updated the output of Users Activities. You have to download it. Just click the link. It's (user ID, number of activities)

I am going to try to do some descriptive statistics on the data. I'm super impressed with Spark. It took 2 minutes and 40 seconds to do this analysis on the Posts.csv with 3 workers. It's AWESOME.