nicolarighetti / CooRTweet

CooRTweet: Coordinated Networks Detection on Social Media | Detects a variety of coordinated actions on social media and outputs the network of coordinated users along with related information.
https://CRAN.R-project.org/package=CooRTweet
Other
32 stars 4 forks source link

performance optimizations #24

Closed mrwunderbar666 closed 1 year ago

mrwunderbar666 commented 1 year ago

There are some small changes that heavily impact memory usage:

1) the indices are set separately for each column in the "reshape" function (this is a minor tweak) 2) The coordination detection function now converts the id columns (object_id, id_user, content_id) to factors, then it runs the computations, and then transforms them back to character vectors. This has a large impact on memory utilization. Before the change, every string got multiplied in memory for each possible combination. This blows up memory when there is a very large number of possible pairs. Now with the factor conversion this does not happen anymore. This makes the function marginally slower though.

With this change I was able to run it on a dataset with ~10 million tweets and a time window 120 seconds. The largest group had about 16000 members.