Closed andyjagoe closed 1 year ago
Hi @andyjagoe, thank you for your proposal! We will be in touch with an update once we have completed our initial review.
Thanks @ErinOCon! Don't hesitate to reach out if I can answer any questions.
Hi @andyjagoe, this grant has been approved! Can you provide an email address to discuss next steps?
Hi @ErinOCon, thanks for your message! Are you able to send me a quick message here (https://andyjagoe.com/contact-me/) and I will respond with my email?
Hi @ErinOCon, per your request on slack, just confirming we connected and you have my correct email. Thanks again for your quick responses to get the agreement signed and in place!
many thanks, @andyjagoe!
Open Grant Proposal:
Web3 Product Analytics
Name of Project: Web3 Analytics
Proposal Category:
app-dev
Proposer:
@andyjagoe
Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: Yes
Project Description
The project is a decentralized analytics platform for web3. Like Google Analytics for web3. Or Dune Analytics for off-chain data.
Why is this needed?
80% of the top 10,000 web sites on the internet use Google Analytics. In fact, the majority of web3 projects do too. But, as a web3 project, giving all your data to Google breaks your brand promise to users and goes against the ethos of decentralization. Big Tech sells this data to the highest bidder for ad targeting. And users have no control over their data and no visibility into what is collected.
But a web3 project without analytics is flying blind. Without analytics, you have no data driven product insights. Only guesses. Trying to build a consumer product without analytics is like trying to fly a 747 jet airplane without instruments or a dashboard. It’s very difficult, and often ends in a bad outcome. A competitor who has analytics will fly circles around you if you don’t.
On-chain analytics are great. But they’re the very bottom of the funnel. A tiny fraction of the user journey. They miss everything that leads up to the transaction, and everything that happens afterwards. Alone, they are just not enough.
Web3 Analytics is an attempt to provide projects another option. A decentralized analytics public good that can't be shut down or censored. A solution that is default open and enables complete transparency into how a product is being used, without compromising privacy or web3 values.
How is it different?
All the analytics solutions today are centralized and default closed. This means that only the app owner can see the dashboards and what data is being collected. Users do not own their data and may not delete it unless the app owner allows them to. Also, because data is centralized, these solutions can be censored or shut down. There's a forward looking new analytics products called Plausible Analytics (https://plausible.io/). They are open source and allow you to host your own service--but they don't solve the problem of centralization and they don't make analytics data a public good. The majority of web3 projects are using some sort of centralized service, with at least half using Google Analytics.
By contrast, the solution I'm proposing is a decentralized public good, where all data is readable by anyone and censoring it or shutting it down is difficult. Projects can get critical product insights to improve their user experience without breaking a user's trust or compromising web3 values. And it is default open, so it increases the transparency into how projects are being used. It does for off-chain data what Dune Analytics has done for on-chain data.
Current Status
I've built a proof-of-concept (alpha) of the system, which consists of 5 components:
Here is how the system works, with links to source code and the prototype.
The purpose of this grant is to allow me to complete a usable initial version of the project and deploy it to mainnet.
Value
The benefit to IPFS if this project is successful is a significant increase in IPFS usage. Each application that instruments with Web3 Analytics generates a continuous river of data into the IPFS ecosystem. This data is pinned by default, so more data also means more fee income for IPFS pinning services. More fee income enables more investment by participants into the IPFS ecosystem. The risk of not having an application like this on IPFS is that someone creates a similar application for a competing ecosystem and that ecosystem is the beneficiary of the increased usage and fee income instead.
The primary execution risk for this project is not technical. A proof-of-concept exists and the path to mainnet and scalability is reasonably clear. The main challenge for the project will be getting traction within the web3 community. This is normal for a project at this stage.
The good news is that the project has a relatively straight-forward and compelling distribution motion. A project doesn't need to choose between Web3 Analytics and a competing analytics product. The front-end instrumentation package allows sending data to both Web3 Analytics and most other major analytics packages in parallel. This allows someone to easily try Web3 Analytics while still using their existing analytics platform. It also reduces analytics switching costs and platform lock-in. Because web3 projects are open source, we can identify high potential customers, do the integration/migration work ourselves, and issue a PR for inclusion. A project has little to lose and a lot to gain by accepting the PR. Worst case, they have a new solution that has reduced lock-in and switching costs in their analytics stack. This approach won't scale in the long term, but is a great way to bootstrap adoption / trial in the early days of the project.
As web3 projects like Farcaster (and others like it) mature, they will realize their protocols need decentralized, privacy preserving analytics built into the protocol or provided by a third party decentralized network. Farcaster needs to aggregate impressions, clicks, expands, profile visits, etc (just like Twitter does) and this can't be done separately in each Farcaster client or each client's data will be incomplete/fragemented.
I don't think it will make sense for every web3 project to roll their own decentralized analytics solution from first principles. This is why I think there's a great opportunity for Web3 Analytics and IPFS to provide the solution.
Deliverables
The goal of the grant proposal is to deliver a usable first version to mainnet. Specifically, this means taking the existing project and completing the below deliverables.
Front-End Instrumentation:
Decentralized Data Network:
Smart Contract Registry:
Indexer:
Dashboard Builder:
Development Roadmap
I will work full time on the project for three months, and the grant will subsidize my personal costs so I can complete the project and make it generally available on mainnet. Please see the section on Total Budget Requested for cost and time estimates for each component.
Front-End Instrumentation
To use Web3 Analytics, you instrument your app using Analytics (a lightweight open-source frontend analytics abstraction layer) and use the web3 analytics plugin I wrote as a decentralized data back-end. Analytics has plugins for most major analytics systems and they can be run in parallel, so taking this approach removes vendor lock-in and reduces risk.
You can see Web3 Analytics working on this demo site (source code). Click the buttons to generate events. You'll see confirmation toasts appear in the browser. Open the browser console to see the interactions with the decentralized back-end.
What still needs to be done:
Decentralized Data Network
Web3 Analytics uses Ceramic for decentralized data storage. Ceramic is a decentralized and permissionless data streaming network built on top of IPFS. It provides DID authentication, immutable naming, and mutable streams in a decentralized manner.
When a new user arrives at your app, we auto-generate a secp256k1 keypair for the user we store in the browser. This keypair is used for DID authentication to Ceramic for writing analytics data. We store the entire analytics payload of every event in Ceramic and the data is cryptographically secured and belongs to the user. While anyone can read this anonymous data, only the user may delete or modify it using the keypair.
I created a secp256k1 adapter for Ceramic because it is the same curve used in Ethereum and Bitcoin. This is important because of the next component. Since our data is decentralized, we need a smart contract to keep track of it so we can find it again.
What still needs to be done:
Smart Contract Registry
Web3 Analytics has a smart contract that tracks apps and the addresses of the users that belong to them. An app that would like to use Web3 Analytics must first call the registerApp function on the Web3 Analytics smart contract. Once done, the app can register users and allow users to record data in Ceramic.
The DID generated to securely write to Ceramic can be converted to an Ethereum keypair because they both use secp256k1. Web3 Analytics checks to see if the user is registered with this app in the smart contract. If not, it silently processes a registration transaction in the background using Gas Station Network v2 (OpenGSN).
The contracts are currently running on Rinkeby but will deploy to an EVM compatible Layer 2 (e.g. Polygon) for production so each user registration only costs a fraction of a cent.
What still needs to be done:
Indexer
Having the raw analytics data in a decentralized data store is great. It’s an open, permissionless, user owned, community asset available to everyone. Like blockchain data.
But, like blockchain data, it’s difficult to get insights from and build dashboards with a raw data format. You need a way to load it into traditional datastores for processing.
To address this, I’ve created an indexer for the data. It's an automated pipeline that uses the Airbyte open source ELT platform and pushes normalized data directly to an S3 data lake. Our source connector continuously monitors the blockchain and Ceramic for new apps, users and data for indexing.
For now, data is stored in an S3 data lake in Apache Parquet format and accessed via AWS Athena. Apache Spark also supports S3 data lakes in parquet format and is another option for us as we scale. Once Ceramic's GraphQL interface has robust indexing and sufficient performance to support analytics queries, we will pull data directly from Ceramic instead of using S3.
What still needs to be done:
Dashboard Builder
The dashboard builder is a free tool that allows anyone to build a dashboard using SQL. Similar to Dune Analytics, queries and dashboards are default public and designed to be easily forked and shared.
I expect we will create many of the first dashboards as templates to make getting started easier for users. The hope is we hand this off to the community over time the same way Dune has.
An alpha version of the dashboard is here and you can view a sample dashboard I’ve created for one of the sites I’ve instrumented. Source code is here. Screenshots of a sample dashboard and a sample query used to create a dashboard component are below.
Sample dashboard page UI:
Sample query page UI:
What still needs to be done:
Total Budget Requested
Total budget requested is $30,000:
12 months of hosting cost for Ceramic node and indexers: $250/mo * 12 = $3,000
Month 1: $9000
Month 2: $9000
Month 3: $9000
Maintenance and Upgrade Plans
Applications that use the service will need to pay a small fee for registration and usage to cover gas station network expenditures. These fees could also fund maintenance and upgrades of the service.
Team
Team Members
Andy Jagoe
Team Member LinkedIn Profiles
https://www.linkedin.com/in/andyjagoe/
Team Website
https://web3analytics.network/ (prototype)
https://web3analytics.network/users/99281713380d8efc77348ef00b1f02ec/dashboard/andyjagoecom-key-metrics (sample dashboard)
Relevant Experience
I've started and exited a few software companies, including a messaging company I grew to millions of users and was sold to Skype and a search company that I turned around and sold to eBay. I've built multiple Internet scale products and have managed several large-scale analytics deployments, including for a site that serviced 75 million annual uniques.
My core expertise is product and engineering, and for the last several years I've been focused on web3. I believe public blockchains collapse the cost of large-scale network coordination, and I'd like to see a more fair and equitable distribution of value creation in future platforms (especially for users). For the last year, I've been doing web3 consulting for AngelList.
https://andyjagoe.com/about/
https://www.linkedin.com/in/andyjagoe/
Team code repositories
https://github.com/andyjagoe
Additional Information
Karola Kirsanow from Protocol Labs told me about the Open Grants program and thought this project might be a fit. More details on how the Web3 Analytics system works, including links to source code and a prototype can be found here.