mikeizbicki / cmc-csci046

CMC's Data Structures and Algorithms Course Materials
52 stars 152 forks source link

Program running but no data in 'outputs' files #255

Closed agulati18 closed 3 years ago

agulati18 commented 3 years ago

Hi Mike,

My program seems to be running and I am getting two zip files, for country and lang, in my outputs folder. But, when I vim into anyone of these files they are empty and have no data. I am not sure how to explain this better because I haven't explicitly been told of any errors and I believe that my map.py file also is doing what it should do. This is what my map.py file looks like:

import argparse
  5 parser = argparse.ArgumentParser()
  6 parser.add_argument('--input_path',required=True)
  7 parser.add_argument('--output_folder',default='outputs')
  8 args = parser.parse_args()
  9 
 10 # imports
 11 import os
 12 import zipfile
 13 import datetime
 14 import json
 15 from collections import Counter,defaultdict
 16 
 17 # load keywords
 18 hashtags = [
 19     '#코로나바이러스',  # korean
 20     '#コロナウイルス',  # japanese
 21     '#冠状病毒',        # chinese
 22     '#covid2019',
 23     '#covid-2019',
 24     '#covid19',
 25     '#covid-19',
 26     '#coronavirus',
 27     '#corona',
 28     '#virus',
 29     '#flu',
 30     '#sick',
 31     '#cough',
 32     '#sneeze',
 33     '#hospital',
 34     '#nurse',
 35     '#doctor',
 36     ]

 # initialize counters
 39 counter_lang = defaultdict(lambda: Counter())
 40 counter_country = defaultdict(lambda: Counter())
 41 
 42 # open the zipfile
 43 with zipfile.ZipFile(args.input_path) as archive:
 44 
 45     # loop over every file within the zip file
 46     for i,filename in enumerate(archive.namelist()):
 47         print(datetime.datetime.now(),args.input_path,filename)
 48 
 49         # open the inner file
 50         with archive.open(filename) as f:
 51 
 52             # loop over each line in the inner file
 53             for line in f:
 54 
 55                 # load the tweet as a python dictionary
 56                 tweet = json.loads(line)
 57 
 58                 # convert text to lower case
 59                 text = tweet['text'].lower()
 60                 try:
 61                     lower = tweet['place']['country_code'].lower()
 62                 except TypeError:
 63                     lower = 'NA'
 64                     pass
 65 
 66                 # search hashtags
 67                 for hashtag in hashtags:
 68                     lang = tweet['lang']
 69                     if hashtag in text:
 70                         counter_lang[hashtag][lang] += 1
 71                         counter_country[hashtag][lower] += 1
 72                     counter_lang['_all'][lang] += 1
 73                     counter_country['_all'][lower] += 1
 74 
 75 # open the outputfile
 76 try:
 77     os.makedirs(args.output_folder)
 78 except FileExistsError:
             pass
 80 output_path_base = os.path.join(args.output_folder,os.path.basename(args.input_path))
 81 
 82 output_path_lang = output_path_base+'.lang'
 83 print('saving',output_path_lang)
 84 with open(output_path_lang,'w') as f:
 85     f.write(json.dumps(counter_lang))
 86 
 87 output_path_country = output_path_base+'.country'
 88 print('saving',output_path_country)
 89 with open(output_path_country,'w') as f:
 90     f.write(json.dumps(counter_country))

I would really appreciate your help in explaining what's going on here and why, even after my program runs, I have no data in any of my outputs files.

mikeizbicki commented 3 years ago

I think this will be better answered in lab/office hours. It will likely take some back and forth to get to the fundamental issues.