Check user-level calculation

ryantmoore commented 5 years ago

As of 9a94bb3, the user-level proportion defined in botscan.R at line 99 divides the number of bots identified in the full list of usernames by the number of unique usernames. I propose that this should be the number of unique bots identified divided by the number of unique usernames.

Proposed fix:

Move line 86, used only in the calculation above,

nbots <- sum(df_userbots$cap.universal > threshold)

within the if(user_level){} and structure as

nbots <- sum(
    (overallscreennames %in% botscreennames) &
    (overallscreennames_score > threshold)
    )

kurtawirth commented 5 years ago

This has been completed as of commit #a12c18044401775da6a7337c4438070b4fc37d25. Closing issue.

ryantmoore commented 5 years ago

From lines 115-6, can you verify that

(tweets$screen_name %in% bots$user.screen_name) will identify unique instances of tweets$screen_name? That is, that there are no duplicates in tweets$screen_name.
data frames tweets and df_userbots have the same observations in the same order?

If either is not true, I think we need to refix the calculation.

kurtawirth commented 5 years ago

With recent changes, this is handled differently and is now accurate.

ryantmoore commented 5 years ago

Fixed in commit 97a4a1926b2337cafc94e67fbc716f44f73ff5ad

kurtawirth / botscan

Check user-level calculation #19