tailaijin / Data_Mining-Final_Project_Music

3 stars 0 forks source link

@Meng Qi @Jiayi yang Grammy & AMA Official Data (Scraping technologies are required): #4

Open tailaijin opened 7 years ago

tailaijin commented 7 years ago

https://www.grammy.com/nominees/search?artist=&field_nominee_work_value=&year=2 http://www.theamas.com/winners-database/?winnerKeyword=&winnerYear=1974&winnerCateg

@Meng Qi @Jiayi yang

tailaijin commented 7 years ago

Each row indicates one relationship between music and artist. This dataset contains: PK_id | Music | Award | Artist | Artist_type | Year |

irisqi commented 7 years ago

import pandas as pd import requests from bs4 import BeautifulSoup lst=list() year=list() reward=list() name=list() winner=list() for i in range(2010,2016): url = 'https://www.grammy.com/nominees/search?artist=&field_nominee_work_value=&year='+str(i)+'&genre=All' r = requests.get(url) soup = BeautifulSoup(r.content,'lxml')

print (soup.prettify())

award=soup.find_all('td')
for row in award:
   lst.append(row.text)

for i in range(150): year.append(lst[4_i]) reward.append(lst[4_i+1]) name.append(lst[4_i+2]) winner.append(lst[4_i+3])

data={'year':year,'reward':reward,'name':name,'winner':winner} grammy=pd.DataFrame(data,columns=['year','reward','name','winner']) print(grammy)

Jiayi-Yang commented 7 years ago

交Google Drive了......这里不让传csv文件(:з」∠)