livepeer / grants

⚠️ DEPRECATED ⚠️ Please visit the new homepage at https://livepeer.notion.site/Livepeer-Grants-Program-f91071b5030d4c31ad4dd08e7c026526
44 stars 7 forks source link

Community Arbitrum Node #59

Closed FTKuhnsman closed 1 year ago

FTKuhnsman commented 2 years ago

Give a 3 sentence description about this proposal. I am proposing that Livepeer subsidize a community arbitrum node that will be free for all active orchestrators and/or future broadcasters to use instead of relying on a hosted service (like alchemy or infura) or each participant running their own nodes.

Describe the problem you are solving. Since moving to arbitrum layer 2, all orchestrators have been dealing with issues finding a stable and functional arbitrum node. While many of us tried using Alchemy, Infura, or the public RPCs provided by Ankr and Offchain Labs, it is extremely common to experience service outages and/or rate limiting, which often causes a drop in streams and can even cause the livepeer executable to crash. The current alternatives are opting for the paid plans offered by third party providers, or running your own node. Both options are prohibitively expensive given the price of third party services and the cost of renting cloud compute resources.

Describe the solution you are proposing. My solution is to stand up and maintain an arbitrum RPC endpoint dedicated to serving the Livepeer community. This service will leverage multiple arbitrum nodes for redundancy and it will not impose any rate limits or restrictions. The grant proceeds will be used to cover both operating and maintainence expenses for the next 12 months.

Describe the scope of the project including a rough timeline and milestones Over the past month, I have already spent resources to get this service operational. Currently, I am hosting the arbitrum RPC endpoint through my website https://livepeer.ftkuhnsman.com/.

Service Architecture:

Web server 1 - This server hosts the web service and routes all RPC traffic to the arbitrum node pool. The web service is built using a Django python backend, which sits behind a Gunicorn web server and NGINX reverse proxy. The website/Django backend is responsible for managing and authenticating users to the service. In order to use the service, an orchestrator must create an account on the website and generate an API key (a brief tutorial on how to do this has been included on the user profile page on the website). When generating an API key, an orchestrator is required to sign an arbitrary message using the livepeer_cli (or equivalent) and provide both the message and resulting signature. This allows the web service to verify the public address of the user which is then reconciled to the current list of active orchestrators (periodically queried from the livepeer subgraph). If a user's message/signature is successfully verified as an active orchestrator, an API key will be generated and automatically activated for immediate use. All the user has to do at this point is set their ethUrl flag with the appropriate url (https://arbitrum.ftkuhnsman.com/api//l2). This method ensures that only active orchestrators can use the service.

Database Server - The web service uses a PostgreSQL database hosted on a dedicated database server. By hosting the database on a remote server, I can easily deploy additional web servers, as needed, in the event that traffic reaches a point that one server can no longer handle on its own (based on my initial testing I do not believe the volume of requests will reach a point that requires an additional web server, but I want to ensure the option is available).

Arbitrum Servers - I am currently using two separate servers, each running an arbitrum node and supporting geth light node. The primary server is responsible for responding to all RPC requests. In the event that the primary arbitrum process crashes or fails to respond (due to an issue with the software or L1 node), all requests will be routed to the secondary arbitrum node. I opted to use ProxyD to handle request routing for the following reasons: (1) it's fast and light on resource usage, (2) it uses automatic retries instead of dropping requests, (3) it handles automatic failover seamlessly, and (4) most importantly it caches RPC responses allowing fewer requests to hit the arbitrum node(s). ProxyD runs on the primary arbitrum node server.

This service has been operational for 1 week and is currently being beta tested by 9 orchestrators with a total 28 orchestrator nodes. At this time, no major errors or outages have been reported.

Milestones:

Phase 1: MVP Beta Test - 5/13/2022. The beta testing period has not been formally defined, but I believe it has reached a point where the service can reasonably be considered stable.

Phase 2: Geth Full Node Implementation - 6/1/2022: Both arbitrum nodes are currently using local geth light nodes for L1 connectivity. While this has not caused any issues thus far, I believe it is pertinent to stand up a geth full node for redundancy given past known issues with geth light nodes not having enough peers. Each arbitrum node will still use their local geth light nodes as primary (to reduce latency), but I will use proxyd to failover to the full node if issues arise. As part of this implementation I will also enable L1 RPC functionality. While this may not be used by any orchestrators extensively, it is simple enough to do and may provide some value.

Phase 3: Build / Deploy Dedicated Test Environment - 7/1/2022: In order to fully support the service, replicate and troubleshoot bugs/errors, and develop more than trivial feature enhancements, I will need to fully replicate the production environment on separate hardware.

Phase 4: Enhanced Monitoring and Error Reporting - 9/1/2022: Currently, I am struggling to troubleshoot minor errors when they occur due to the extend of voluminous logging. I intend to implement a log aggregation and parsing solution that will allow me to more effectively triage issues as they occur. I will also implement automated notifications for individual node and service outages.

Phase 5: Custom Support - 10/1/2022: Currently, I am the only person supporting and maintaining the service. While it has been manageable so far, I feel it is pertinent to find and train another individual (specifically in an opposite time zone) that can help provide support 24/7. I will continue to provide full technical support, but my goal is to find a secondary resource by 10/1/2022.

Phase 6: Adoption - 12/1/2022: While encouraging orchestrator adoption of the service will be an ongoing effort, by the end of 2022 my goal is for the community node service to be the go-to Arbitrum RPC for the livepeer network. I consider success to be serving more than 50% of the active orchestrators by this time.

Please estimate hours spent on project based on the above

Project Economics:

I value my time at $55/hour. All time and resource estimates below are using this effective rate.

Phase 1 - Initial MVP Build / Beta Testing (3/1/2022 - 5/13/2022): Time incurred to date - 100 hours ($5500) Technology cost incurred to date ($400) Estimate time remaining - 20 hours ($1100)

Phase 2 - Geth Full Node Implementation - 20 hours ($1100)

Phase 3 - Build / Deploy Dedicated Test Environment - 20 hours ($1100)

Phase 4 - Enhanced Monitoring and Error Reporting - 30 hours ($1650)

Phase 5 - Customer Support resource acquisition and training - 20 hours ($1100)

Ongoing Costs To Run and Maintain The Service:

Technology Cost:

Both the production and test environments will require 5 servers (currently 4, but geth full node will require an additional) Each server is hosted by Contabo and costs $42 per month - Total $210/month ($2520 annualized)

Annual cost for both the production and test environments: $5040

Customer Support / Service Maintenance: Estimated 5 hours per week ($14,300 annualized)

Total Phased Capital Requested: $11,950

Total Annual Cost of Operations For Reimbursement: $19,340

AuthorityNull commented 2 years ago

Finding a reliable RPC provider that doesn't charge an arm and a leg for consistent uptime has been a pain point for most of us Orchestrators since we moved to Arbitrum and there's been a real need for a community based solution. The community Arb node as it stands now has already proved to be incredibly valuable and as stated in the proposal, is being actively used by many Orchestrators. This grant could greatly benefit the network :)

mikezupper commented 2 years ago

I think this is a fantastic idea. The stability of Livepeer Orchestrators L1/L2 node integration will be solved within the community. We will not need to individually spend time, energy or money into individual solutions. Moreover, the community has very specific needs from Arbitrum and Etherium. Infura and Alchemy are general solutions for the masses. They have to fight with bots and all kinds of nefarious users. This solution provides isolation! I love the live peer cli integration… guarantees only LP Users can signup! @ericxtang feel free to follow up with more questions or even join us during the Monday water cooler discord session to chat about it.

papabear99 commented 2 years ago

I've been using the Community Arb node since it was in early beta/test phase and it's been great. I used to have each of my nodes running RPCs from different providers to try and mitigate the unreliability of the public options but it was still common that they would all go down at the same time, this is no longer an issue with Community Node. Having a reliable node for Livepeer users is a great idea IMO.

JJassonn69 commented 2 years ago

As an active orchestrator I too believe that a reliable and community oriented RPC solution was a must and this project provides us that. Full support 🙌

Franck-UltimaRatio commented 2 years ago

We are using the community Arb Node since the beginning and it work like a charm ! it s a great idea to have it for the community :)

stronk-dev commented 2 years ago

Hosting a reliable orchestrator is difficult enough, not having to worry about rate limits or instability issues from your RPC provider is very much appreciated 👍

nelsorya commented 2 years ago

Hey @musicmank545, thanks for putting this proposal together. It's great to see what you have put together and the validation of this problem from the community.

We are happy to fund this grant for the work already completed, the work left to be completed and the annual ongoing maintenance if you are willing to build this in the open as much as possible, giving regular updates to the community with all monitoring, error reporting, building and deploying the testing environments should be open source. The goal is for this to be a public good which is transparent for the community.

For the maintenance costs, we would propose paying these out in quarterly milestones for ongoing maintenance.

FTKuhnsman commented 2 years ago

@nelsorya I'm glad to hear that Livepeer supports this effort. Can we schedule some time to discuss the specifics? I can be reached directly on Discord (ftkuhnsman#5819).