CooRTweet: Coordinated Networks Detection on Social Media | Detects a variety of coordinated actions on social media and outputs the network of coordinated users along with related information.
There are some small changes that heavily impact memory usage:
1) the indices are set separately for each column in the "reshape" function (this is a minor tweak)
2) The coordination detection function now converts the id columns (object_id, id_user, content_id) to factors, then it runs the computations, and then transforms them back to character vectors. This has a large impact on memory utilization. Before the change, every string got multiplied in memory for each possible combination. This blows up memory when there is a very large number of possible pairs. Now with the factor conversion this does not happen anymore. This makes the function marginally slower though.
With this change I was able to run it on a dataset with ~10 million tweets and a time window 120 seconds. The largest group had about 16000 members.
There are some small changes that heavily impact memory usage:
1) the indices are set separately for each column in the "reshape" function (this is a minor tweak) 2) The coordination detection function now converts the id columns (object_id, id_user, content_id) to factors, then it runs the computations, and then transforms them back to character vectors. This has a large impact on memory utilization. Before the change, every string got multiplied in memory for each possible combination. This blows up memory when there is a very large number of possible pairs. Now with the factor conversion this does not happen anymore. This makes the function marginally slower though.
With this change I was able to run it on a dataset with ~10 million tweets and a time window 120 seconds. The largest group had about 16000 members.