tailaijin / Data_Mining-Final_Project_Music

3 stars 0 forks source link

@ Tailai Jin Cut database into reasonable size. Track table has more than 0.2 Billion records. #2

Open tailaijin opened 8 years ago

tailaijin commented 8 years ago

The idea now is filtering by time / area/ genre to get a smaller dataset.

tailaijin commented 8 years ago

After adding some index, MBZDB enjoys a much better efficient. I dropped all the tables without any record. Some queries are implemented to insight the data.

`

4. Validate some distribution

NULL year or 2017/2019 must be wrong # Delete in further analysis

SELECT COUNT(re.name), rc.date_year FROM release AS re, release_country AS rc WHERE rc.release = re.id GROUP BY (rc.date_year);

Detach unreseasonable date

SELECT COUNT(name), begin_date_year, end_date_year FROM artist WHERE begin_date_year > 1950 AND begin_date_year < 2017 AND (end_date_year > 1990 OR end_date_year IS NULL) GROUP BY(begin_date_year);

Varioulsy types of place

SELECT COUNT(*), place_type.name FROM area, place_type WHERE area.type = place_type.id GROUP BY(area.type);

the US has 30 Washington Country...

SELECT COUNT(gid), name FROM area GROUP BY(name) ORDER BY(COUNT(gid)) DESC

SELECT * FROM place SELECT * FROM place_type`