Member activity report (star helper automation)

vcarl commented 1 month ago

We currently manually administer the Star Helper program, querying metrics resources for activity of our members and using spreadsheets to create a report to grade.

The metrics we track are quite simple: for a list of channels, rank the top 100 members by a combination score of

# of messages over the entire grading period
# of channels over the entire grading period
# of channels per day within the grading period

This should actually be

- - # of messages over the entire grading period
+ - volume of material posted over the entire grading period
  - # of channels over the entire grading period
  - # of channels per day within the grading period

because # of messages counts a 1-word reply and a 3 paragraph treatise as equivalent. We should count words or characters. (words, and characters/word separately?? :galaxybrain:)

Our current scoring thresholds, which could probably use some updating:

Messages
0	200	400	800	1500
0	1	2	3	4

Channels
0	2	5	10	20
0	0.25	1	2	3

Channels/day
0	1	3	6	8
0	0.25	0.5	1	2

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

vcarl commented 1 month ago

Doing this maximally right would probably mean figuring out an appropriate time series datastore for the metrics data, but if we generate reports and delete the underlying data, that's probably sufficient for our needs here because the data is very unlikely to scale beyond megabytes. When feasible (privacy/scalability concerns relevant here), we should store data that allows for derived values to be recalculated, i.e. it seems better to me to track messages, vs updating a score on each message.

This is likely aided by completing #60 and making it easier to work with the database first, but if we use e.g. a Cloudflare Analytics Engine that may not be necessary.

This is going in mod-bot instead of reactibot because this related to context over time and a mods-eye view of the server. Cultural curation, vs specialized code for Reactiflux.

vcarl commented 1 month ago

This feature, fully implemented, should allow for different groups of channels to be configured. "Channel category" is a good shortcut for selecting many channels, but arbitrary channels should be able to be grouped together for scoring purposes.

To describe it abstractly, I want this to reflect the social reality that people tend to select into distinct social groups based on what channels they participate in vs read vs never use. More concretely, this should allow us to set up qualitative participation metrics for help channels (Star Helpers), career channels, and social channels.

Part of why it'd advantageous to store data and calculate derived values for reports is to allow some flexibility with how those reports are generated. I'm not sure e.g. what thresholds will work best to capture participation, so I want to track enough metadata to get room to play with. It might make sense in the future, when those lessons about thresholds and modes of participation are better understood, to reduce the amount of data stored by calculating scores in real time

vcarl commented 1 month ago

I've been thinking of this lately:

We should count words or characters. (words, and characters/word separately?? :galaxybrain:)

I think there's a lot of value in getting the metadata right, as means of generating signal for describing participation. Some thoughts, as a wishlist:

Author (duh), person replied to, (if in a thread) thread owner (for Reactibot threads, author of first message)
Channel/channel category
Emoji count + variety
Message length + wordcount
Inferred language/locale
Complex linguistic evaluations, like Flesch Kincaid, Lexile, or Gunning fog index

reactiflux / mod-bot

Member activity report (star helper automation) #58

Upvote & Fund