kurtawirth / botscan

R tool for scanning Twitter for bot activity on a conversation level.
GNU General Public License v3.0
10 stars 6 forks source link

Remove duplicate searches for the same handle to improve speed #15

Closed ryantmoore closed 5 years ago

ryantmoore commented 6 years ago

Currently, we scrape tweets, store all their handles, and search each handle in sequence. We should store only the distinct handles, then search that subset, then (likely) merge the results back into the tweet data.

ryantmoore commented 5 years ago

@kurtawirth Tested this using search #MAGA and timeout = 30, finding 43 records and one duplicate. For the repeated user ID, the elements that should be constant from the bot-identification were constant, and the elements that should be different (tweet text) were different.