ryanjgallagher / focalevents

Tools for collecting social media data around focal events
MIT License
84 stars 15 forks source link

source parameter not included in all tweet data? #4

Closed asmithh closed 3 years ago

asmithh commented 3 years ago

I'm not sure if this is because I'm looking at ancient tweets (2006 onward) or just have bad luck, but I've been getting this error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/a404/ivermectin/focalevents/twitter/search.py", line 790, in <module>
    args.update_interval)
  File "/Users/a404/ivermectin/focalevents/twitter/search.py", line 714, in main
    search.search()
  File "/Users/a404/ivermectin/focalevents/twitter/search.py", line 596, in search
    self.manage_writing(response_json)
  File "/Users/a404/ivermectin/focalevents/twitter/listener.py", line 261, in manage_writing
    raise err
  File "/Users/a404/ivermectin/focalevents/twitter/listener.py", line 250, in manage_writing
    self.write(tweets, includes)
  File "/Users/a404/ivermectin/focalevents/twitter/listener.py", line 285, in write
    all_inserts = get_all_inserts(tweets, includes, self.event, self.query_type)
  File "/Users/a404/ivermectin/focalevents/twitter/helper.py", line 120, in get_all_inserts
    tweet_insert = get_tweet_insert(tweet, event, query_type, direct=True)
  File "/Users/a404/ivermectin/focalevents/twitter/helper.py", line 409, in get_tweet_insert
    'source': tweet['source'],
KeyError: 'source'

seems to be (temporarily??) fixed by making tweet['source'] = 'None' if 'source' isn't in tweet before we assign everything, but that may not be ideal if we actually care about the source.

anyway, I may just be cursed. lmk if this is the case!!!

https://github.com/ryanjgallagher/focalevents/blob/ef2d132c57a2d38d3d2af7e8bd7b7d4949a1056d/twitter/helper.py#L409

ryanjgallagher commented 3 years ago

I wouldn't be surprised if it turns out the source field was inconsistent early in Twitter's operation.

I think you have the right idea for the fix. Above the tweet_insert dictionary definition you can see I have a bunch of try/excepts for this kind of thing. Would you be able to put in a pull request that adds one for source? So something like

try:
    source = tweet['source']
except KeyError:
    source = None

# so on...

tweet_insert = {
'source': source
# so on...
}

Ideally there should probably be a more robust way of handling missing fields without having to catch them all individually...

ryanjgallagher commented 3 years ago

I had to fix something else, so I fixed this in 424736d046b6feec66154efeacaccfd534b6ddb2, you should be good to go now. Let me know if you run into other issues though! Thanks for catching this

asmithh commented 3 years ago

thank you!!!