robmarano / 2018_Summer_Adv_Python_Project

2018 Summer Advanced Python Group Project
GNU General Public License v3.0
1 stars 0 forks source link

Agree upon a final set of keywords for which to mine data #13

Closed yvr2 closed 6 years ago

vladmakutoff commented 6 years ago

It was agreed that we would mine Twitter data for the following three words: lit, ratchet, adulting. More over, after data mining step, the results will be provided in the csv file. Given the previous data samples, the csv files will have the following columns:

index,state,keyword,time 0,NY,lit,Thu Jul 05 11:20:57 +0000 2018 1,NY,lit,Thu Jul 05 11:20:58 +0000 2018 2,NY,lit,Thu Jul 05 11:20:59 +0000 2018 3,NY,lit,Thu Jul 05 11:20:59 +0000 2018 4,NY,lit,Thu Jul 05 11:21:00 +0000 2018 5,NY,lit,Thu Jul 05 11:21:00 +0000 2018 6,NY,lit,Thu Jul 05 11:21:00 +0000 2018 7,NY,lit,Thu Jul 05 11:21:00 +0000 2018 8,NY,lit,Thu Jul 05 11:21:01 +0000 2018 9,NY,lit,Thu Jul 05 11:21:01 +0000 2018 10,NY,lit,Thu Jul 05 11:21:01 +0000 2018 11,NY,lit,Thu Jul 05 11:21:01 +0000 2018 12,NY,lit,Thu Jul 05 11:21:02 +0000 2018 13,NY,lit,Thu Jul 05 11:21:02 +0000 2018 14,NY,ratchet,Thu Jul 05 11:21:02 +0000 2018 15,NY,ratchet,Thu Jul 05 11:21:01 +0000 2018 16,NY,ratchet,Thu Jul 05 11:21:03 +0000 2018 17,NY,ratchet,Thu Jul 05 11:21:03 +0000 2018 18,NJ,ratchet,Thu Jul 05 11:21:04 +0000 2018

………………………

Given this file, a df can be created and keywords counted using the following code:

import csv import pandas as pd import numpy as np import datetime

df=pd.read_csv('C:/Users/vm555/Downloads/data.csv', header=0, sep=',', index_col=0, encoding='utf8') df1=df[df['state'].notnull()] df1['time']=pd.to_datetime(df1['time'], format='%a %b %d %H:%M:%S +%f %Y') df1['time']=df1['time'].dt.round('60min') df2=df1.groupby(['time', 'state', 'keyword']).size().reset_index()

.groupby('keyword')[[0]].max()

……

obviously, change the data file path location on a target computer.

vladmakutoff commented 6 years ago

See attached txt file. Save it as csv, and change a file path in the first line of code above. data.txt