It was agreed that we would mine Twitter data for the following three words:
lit, ratchet, adulting.
More over, after data mining step, the results will be provided in the csv file. Given the previous data samples, the csv files will have the following columns:
It was agreed that we would mine Twitter data for the following three words: lit, ratchet, adulting. More over, after data mining step, the results will be provided in the csv file. Given the previous data samples, the csv files will have the following columns:
index,state,keyword,time 0,NY,lit,Thu Jul 05 11:20:57 +0000 2018 1,NY,lit,Thu Jul 05 11:20:58 +0000 2018 2,NY,lit,Thu Jul 05 11:20:59 +0000 2018 3,NY,lit,Thu Jul 05 11:20:59 +0000 2018 4,NY,lit,Thu Jul 05 11:21:00 +0000 2018 5,NY,lit,Thu Jul 05 11:21:00 +0000 2018 6,NY,lit,Thu Jul 05 11:21:00 +0000 2018 7,NY,lit,Thu Jul 05 11:21:00 +0000 2018 8,NY,lit,Thu Jul 05 11:21:01 +0000 2018 9,NY,lit,Thu Jul 05 11:21:01 +0000 2018 10,NY,lit,Thu Jul 05 11:21:01 +0000 2018 11,NY,lit,Thu Jul 05 11:21:01 +0000 2018 12,NY,lit,Thu Jul 05 11:21:02 +0000 2018 13,NY,lit,Thu Jul 05 11:21:02 +0000 2018 14,NY,ratchet,Thu Jul 05 11:21:02 +0000 2018 15,NY,ratchet,Thu Jul 05 11:21:01 +0000 2018 16,NY,ratchet,Thu Jul 05 11:21:03 +0000 2018 17,NY,ratchet,Thu Jul 05 11:21:03 +0000 2018 18,NJ,ratchet,Thu Jul 05 11:21:04 +0000 2018
………………………
Given this file, a df can be created and keywords counted using the following code:
import csv import pandas as pd import numpy as np import datetime
df=pd.read_csv('C:/Users/vm555/Downloads/data.csv', header=0, sep=',', index_col=0, encoding='utf8') df1=df[df['state'].notnull()] df1['time']=pd.to_datetime(df1['time'], format='%a %b %d %H:%M:%S +%f %Y') df1['time']=df1['time'].dt.round('60min') df2=df1.groupby(['time', 'state', 'keyword']).size().reset_index()
.groupby('keyword')[[0]].max()
……
obviously, change the data file path location on a target computer.