oxen-io / oxen-improvement-proposals

The Loki Improvement Proposal repository
MIT License
11 stars 12 forks source link

Session User Engagement Report #60

Open venezuela01 opened 8 months ago

venezuela01 commented 8 months ago

Session User Engagement Report

Edit: read https://github.com/oxen-io/oxen-improvement-proposals/issues/60#issuecomment-1921187399 for latest update.

Introduction

This report provides a detailed analysis of user activity statistics sourced from the storage.db of the Oxen Storage Server, with data fetched in the end of October 2023. The report delivers an exhaustive review of user engagement, showcasing metrics for Monthly and Biweekly Active Users at both the individual server and the network-wide scale. It also emphasizes the distribution of user activity, indicating that the top 15% of the most active users account for more than 80% of the message traffic, thus demonstrating a pronounced manifestation of the Pareto distribution.

Data Preparation

Monthly Active User (MAU)

On a Single Storage Server

By ensuring that no obsolete messages were included and by conducting a thorough data cleanup, we were able to tally distinct owner IDs in the message table. Our server interacted with 2,961 active users for the month.

Network-Wide Estimation

To approximate the Monthly Active Users across the entire network, we calculated the swarm space coverage ratio of our storage server.

The ratio was determined by:

Consequently, we deduced that our server covers 1/256 of the network space.

By extrapolation, we estimate the network comprises 758,016 (758k) Monthly Active Users (2,961 multiplied by 256).

Edit: Starting from 2024-02-01, we no longer use the min-max estimation for swarm width. Instead, we use the get_service_nodes JSON-RPC call to obtain all swarm IDs of all nodes and calculate the precise swarm boundaries. The results are almost the same as those obtained through the min-max estimation approach, but the precise approach via JSON-RPC is more sensitive in detecting swarm bugs, allowing us to intervene manually when necessary.

Biweekly Active User Analysis

We dive deep into biweekly active users because regular user messages have a 14-days Time-To-Live (TTL). Regular user messages were extracted by filtering messages according to their TTL value. This procedure enabled us to focus on regular user messages and to exclude configuration messages.

The refined dataset contained 283,365 messages from 1,917 users over a biweekly span.

Network-Wide Projection

Applying the same coverage ratio as before (1/256), we project that there are approximately 490,752 Biweekly Active Users network-wide. The same scaling factor is applied in the following analysis.

Pareto Distribution of User Activity

Note on Message Ownership: In the current design of Session, when a user sends a message, a copy is also sent to their own swarm; when a user receives a message, their swarm receives the message. The term 'owning' a message encompasses both sending and receiving; this adds a layer of complexity to the analysis, which we will simplify for the moment by aggregating all such activities under general user engagement.

Distribution of Message Ownership Among Users

Our analysis of user behavior within the network has revealed a pronounced imbalance: a relatively small fraction of users are engaged with a disproportionately large share of messages. This aligns with the Pareto Distribution, which suggests that a small number of individuals often account for a large portion of the effects.

The following table illustrates the cumulative percentage of active users in comparison to the cumulative percentage of messages they are engaged with:

Cumulative Percentage of Users (%) Cumulative Percentage of Messages (%)
0.26 10
0.73 20
1.4 30
2.2 40
3.5 50
5.7 60
9.3 70
15.2 80
26.6 90
100 100

Accumulated Percentage of Messages owned by Most Active Users

This data indicates that in a two weeks span, about 3.5% of the most active users engage with 50% of the messages, and around 15.2% of users account for about 80% of the messaging activity. A similar distribution was noted when analyzing the data based on message storage size instead of message count.

Distribution of User Activity Across Time Buckets

To better understand how often users are active, we split two weeks' worth of messages into 14 daily groups. We tracked how many days each user engages. Then, we group users with the same number of active days together and counted number of users in each group.

Days Active (out of 14) Number of Users
1 200,704
2 82,688
3 45,568
4 31,488
5 22,528
6 18,688
7 15,616
8 13,824
9 11,264
10 9,472
11 7,168
12 8,704
13 7,936
14 15,104

The results are displayed in a Pareto chart reflecting user activity frequency.

Pareto Chart of Usage Frequency

Our findings show that, out of 491k biweekly active users, 200k (approximately 40%) were active on only one of the 14 days, 83k were active on two days, and so on. Only 15k users consistently engage with Session everyday. It's crucial to distinguish this 15k figure from the Daily Active Users (around 126k DAU) metric, which measures the number of unique users who interact with the app within a 24-hour interval, without any guarantee of their return the following day.

It is also noteworthy that with 758k monthly active users and 491k biweekly active users, there are approximately 267k users who have periods of inactivity exceeding 14 days. This raises a concern that they may miss messages due to the 14-day Time-To-Live (TTL) policy.

Daily Active Users (DAU)

We splitted messages into 14 time-based buckets and calculated the number of daily active users. After scaling up with a factor of 256, we extrapolate a DAU between 111,616 and 137,472 on a network-wide basis. On average, we estimate there are approximately 126,244 daily active users across the network.

DAU / MAU Ratio

The DAU/MAU ratio is a key performance indicator that measures user retention and engagement. For our network, this ratio is calculated as follows: 126,244 / 758,016 = 16.7%.

Comparative Analysis with Other Products

We compare the DAU/MAU ratio of Session with other products:

Product DAU MAU DAU/MAU Ratio Source Source Date
Session 126k 758k 16.7% - 2023-10
Facebook Family 3.14B 3.96B 79% Meta Earnings Presentation Q3 2023 2023 Q3
SnapChat 406M 750M 54% Snap Inc. Investor Presentation 2023-10
Brave Browsers 23M 64M 36% BraveBat 2023-09

Session's DAU/MAU ratio is considerably lower than the industry average, potentially signaling its immaturity. This is further supported by the current average rating of the Session Android app, which stands at approximately 3.6, lagging behind the overall average app rating of 4.0 as reported by AppBrain.

Insights and Considerations

  1. Cross-validation of User Statistics: The official number of monthly Session users is reported using a synthesis formula approach. It would be beneficial to cross-validate this method with other approaches and calibrate parameters as necessary. A discrepancy between different metrics does not necessarily mean one is incorrect. For instance, if the synthesis formula by OPTF yields a higher user count than the Oxen storage server data, it could be that many users download the Session app but struggle with account creation or finding friends to communicate with. Another possibility is that a significant percentage of users are using multiple devices, which would be counted only once when calculated using Oxen storage server data.

  2. Community Perception vs. Official Figures: Persistent doubts among community members regarding their perception of the number of Session users compared to official figures may find some explanation in our Pareto distribution analysis. The most active 3.5% of biweekly users (approximately 17k out of 491k) account for 50% of the messages, which corresponds closely with the 15k users consistently engage everyday, yet this figure is much lower than the total MAU. Moreover, the DAU/MAU ratio for Session is notably lower than what is typical in the industry, which may contribute to the community's perception of a smaller user base.

  3. Potential for Monetization: The concentration of activity among a small group of users suggests that we may be closer to achieving monetization than previously considered. If we target the most enthusiastic users and address their common pain points, monetization could be promissing. Assuming that the willingness to pay correlates with user engagement, the most enthusiastic 17k users, representing 3.5% of biweekly or 2.3% of monthly users, might be willing to pay $5 per month. This could potentially generate close to $1 million in annual revenue.

  4. Openness to New Features: The same group of highly active users may also be more open to trying new features, such as making payments in Oxen. Their willingness to engage with new aspects of the platform could be crucial for the success of Oxen.

  5. Privacy Risks with Token Changes: The significant concentration of activity among a small subset of users raises concerns about the increased risk of deanonymization if the Oxen coin is replaced by a transparent token. If the most active users correlate highly with those most willing to pay, then introducing Session Monetization with a transparent token could either significantly inconvenience these active users or substantially elevate the risk of deanonymization for core users. These core users, who most require privacy, are comparable to the hubs or critical nodes within the network. Making privacy transactions optional would be akin to eliminating mandatory onion routing from the Session network. Should the privacy of these core users be breached, the fallout could extend to their contacts, potentially compromising the privacy of a significant proportion of the network. For further details, see Privacy Implications of Replacing the Oxen Privacy Coin in the ONS Registration Process.

Appendix: Synthesis Formula of MAU from OPTF

Source: https://t.me/Oxen_Community/381121

Simplified Formula MAU = A + B + C + D + E

A: iOS user numbers A = 30 Day active users X 2.8571 35% of users opt into providing data through the App Store, so we multiply the provided numbers by 2.8571 (0.35*2.8571=0.999) to find the current MAU details. It is possible—even likely—that Session users opt-in at a lower rate than the average user, but we do not have data to confirm this suspicion, so we take the average.

B: Android Play store numbers B = MAU here is Unique users over 30 days rolling average daily. This is a figure provided by the Play Store, no additional calculation required.

C: APK estimated MAU C. = Combining downloads on the 5 APK’s available on Github. (This does not include people downloading APKs from unofficial sources or sharing APKs privately). This number is reset with each release, we cannot effectively grab this number more frequently.

D: Estimated Desktop MAU D = Combine downloads across 6 github repositories from latest release. We add up all of the latest desktop downloads, and these restart with each release and lag true data.

E: Estimated F-Droid Users E = (A + B + C + D) * 0.1 From historical community and user surveys feedback, we estimate that 10% of Session manage installation and updates via F-Droid. This number will likely change in the future, if we are able to gather further information or counting methods.

Paul1804 commented 8 months ago

the comrade writes very right things !

venezuela01 commented 7 months ago

See also Top 20 User Complains of Session Android Based on Google Play Reviews

venezuela01 commented 7 months ago

Quote from Google Play Console's definition of user metrics:

Metric Definition
Users An individual Google Play user; a user may have multiple devices.
Active users The number of users who have your app installed on at least one device and have used the device in the past 30 days.
New users Users who installed your app for the first time.
All users New and returning users.
Active devices The number of active devices on which your app is installed. An active device is one that has been turned on at least once in the past 30 days.
All devices New and returning devices.
Daily Active Users (DAU) The number of users who opened your app on a given day.
Monthly Active Users (MAU) The number of users who opened your app in a rolling 28-day period.

It's important to note that Google's official definition includes some wording that could be confusing.

Active users refers to users who 'use the device', whereas Monthly Active Users are described as users who 'open the app'.

venezuela01 commented 6 months ago

Update 2023-12-21

Source DAU MAU DAU/MAU Ratio
Official number from OPTF (Synthesis Formula) - over 900K -
Estimation using storage server 144K 842K 17.1%
venezuela01 commented 5 months ago

Update:

Month Average DAU (90% CI) MAU (90% CI) DAU/MAU Ratio Source
2024-01 151,040 to 160,000 905,472 to 937,984 16.7% to 17.6% 5 Storage Servers
2024-02 160,512 to 163,840 920,576 to 974,336 16.6% to 17.6% 5 Storage Servers
2024-03 144,896 to 151,808 920,064 to 957,696 15.4% to 16.3% 5 Storage Servers
2024-04 148,480 to 157,952 923,648 to 1,001,728 15.0% to 16.2% 5 Storage Servers
2024-05 134,144 to 149,504 847,360 to 978,432 14.8% to 15.8% 5 Storage Servers
2024-06 134,912 to 148,992 906,240 to 970,240 14.2% to 15.4% 5 Storage Servers

Notes:

  1. Carefully excluded noisy data points from the swarm bug; otherwise, they could have introduced an inflation of user numbers from 50% to 500% on a single server.
  2. Carefully verified that the De Moivre–Laplace theorem and the central limit theorem works nicely for the swarm user count after removing noisy, buggy data points.
  3. Used the methodology from How to Measure Anything: Finding the Value of Intangibles in Business to derive the 90% confidence interval (90% CI).
  4. The time window for each month covers a period of 30 days ending on the last day of the month.
  5. The calculation has been adjusted since 2024-03. Previously, messages and public keys were counted without filtering by namespace, mixing 1-1 messages and closed group messages, which resulted in a 1% to 3% inflation in prior statistics. Statistics since March 2024 rectify this issue by applying a filter on namespace to exclude closed group messages and keys.
DDOXEN commented 5 months ago

Great

venezuela01 commented 5 months ago

KeeJef: That data is 4 months old and the line underneath mentions DAU is 126k, also data taken from a single snode and extrapolated

@KeeJef @jagerman I have latest data from 5 nodes if you read my comment last week.