stacksgov / grants-program

Welcome to the Stacks Foundation Grant Program. Community members interested in submitting a grant proposal may do so by opening an issue in this repository and filling out the grant application.
140 stars 36 forks source link

MiniNode - synced local copy of the tx log with granular query API and event emitters #251

Closed dcsan closed 1 year ago

dcsan commented 2 years ago

Background

Attacking Stacks scaling problems with a decentralized and simple "mini node" approach

Problems with developing dApps on Stacks:

We propose to solve these problems for a subset of ReadOnly queries:

We propose a mini-node that will use public APIs to keep a synced copy of the blockchain events in a local DB, and a JS/TS API to query this DB.

This does not include a full-running local blockchain node. That is the part of the iceberg below the waterline. We just provide higher level access to the transactions data and state.

We propose using accessible technologies and keeping the code-base easy to contribute to, and instantly useful to the non-core dev teams building NFT and other projects. If this is a fast enough clone it would be trivial for each project to have their own copy of the confirmed transaction history and build out the APIs they need without relying on the hiro team.

Project Overview: 'MiniNode'

Running a full local node is possible, but a huge amount of devops work to just build your own dApp.

For most apps, we don't need to run such a node, we just need to be able to query the data.

The mininode will use the 'firehose' API:

https://stacks-node-api.mainnet.stacks.co/extended/v1/tx

And poll for new transactions.

It will just store these raw TXs into a mongoDB collection.

Then we can make more granular queries into the DB for all the data such as "TXs that contain a specific token ID" or "TXs per contract"

Event based "subscriptions"

Additionally any 'new' transactions found on each poll of the firehose will go into a separate capped collection.

This allows us to a Tailable Cursor, to call our dApp when new data is received. We can use standard JS event emitters to provide a pub/sub or sockets API to calback our dApp only when there are relevant events. Of course a fallback query can also be done if the listener stopped listening.

This would massively lessen strain on the main stacks API nodes, as there would be no need for dApps to keep polling for changes.

Additionally we can setup granular filters as Mongo queries, eg just listen for new TXs on a specific contract or wallet.

Data storage

While mongo has some shortcomings for complex relational queries, it's well suited for storing the blockchain transaction log.

Additionally we could destructure the data somewhat, separating the incomning firehose TX log into queryable Contracts / Tokens / Users collections.

Query structure

We'll provide a number of basic REST endpoints for queries we cannot currently get from Stacks JS API

Analytics

We could also add some simple analytics, eg if people want to see how many transactions are being made on a specific contract. MongoDB provides a decent high-performance aggregation pipeline for these type of queries.

Scope

The project is deliberately limited to NOT become a sprawling megaproject.

We will provide a basic client UI to allow you to pass parameters to pre-built queries, but not a full-blown web UI for building these queries.

Budget and milestones

What grant amount are you seeking? How long will the project take in hours? If more than 20, please break down the project into milestones, with a clear output (e.g., low-fi mockup, MVP with two features) and include the estimated work hours for each milestone.

Total Grant Request: 5000 STX

M1: ~3 weeks

M2: ~3 weeks

M2: ~3 weeks

Team

DCsan will provide the architecture and DB schemas, data structures, based on other existing projects.

For the main work there is not much stacks specific knowhow needed: It's polling an API, storing the documents and making a query UI, like normal a web API. It might be a good opportunity to bring a skilled web2 developer into the stacks ecosystem.

There are a number of people across different stacks NFT projects who've expressed interest and might fit the bill.

Risks

This could be in the long-term future considered as type of "optimistic" transactions where we just write locally so a UI can show it's a write-request, and then wait for a confirmation but that's far beyond scope of what we need for pain points today.

But this kind of work on scaling the stacks ecosystem does play into 2022 scaling topics for Stacks like appChains

Roadmap and future extensions

Fully Open Source

This project will be developed fully in public open-source repositories.

Community and supporting materials

The need for this project arose while developing a couple of other Stacks ecosystem projects.

We've already used this bot for CrashPunks launch and Project Indigo for wallet verification, with more features coming in early 2022. This is the main focus for myself @dcsan but i need the infrastructure features here to support more realtime updates, and so I thought about a more general and open sourced solution that would be useful to others.

Submitted by

David 'DC' Collier

stx-grant-bot[bot] commented 2 years ago

Thanks for submitting a grant application! Due to the holiday break, grant reviews are currently on hiatus and will resume on January 5th. In the meantime, we encourage you to continue to share and get feedback on your application from the community.

dcsan commented 2 years ago

Just to play devil's advocate some other approaches:

Just deal with standing up stacks nodes everywhere for local dev maybe:

for staging / production etc. ... maybe someone [ #250 ] else could do that and provide a postGres replication feed to just pull the data into a queryable PG database so others can build a functional API on top of it? Though in that case, why wouldn't you just wade into stacksJS i guess. Which brings us back to square one.

This approach also looks interesting: https://twitter.com/leopradel/status/1475513460431667202 although I guess it's not running now: https://graphql-api.stxstats.co/v1beta1/relay

I'm not sure how well arbitrary graphQL queries would scale, but perhaps someone could be enticed/funded to provide TheGraph type SaaS on top of that?

also it seems like the latency and errors of the existing stacks API could be worked around with some devops. Syvita crew just deployed some nodes across a few AWS regions and got better stability. https://twitter.com/mrkmcknz/status/1476232724981727241 ofc that doesn't fix the problems that the APIs lack features such as any meaningful way to filter or query.


Overall the current firehose API seems like not that daunting in terms of data It's a known surface area Sucking the data into a mongo instance would make it easier to work with, removing the need to run a node(s) Decouples development from the complexity of the whole stacksJS API system Allows devs to go faster and write queries for just the data they need.

wileyj commented 2 years ago

How will this proposal deal with historical tx data that's not exposed by the /v1/extended/tx endpoint?

dcsan commented 2 years ago

@wileyj mine (and others i'm talking to) main need for this project is to track changes to contracts. eg events where an NFT is transferred from one wallet to another. But rather than constant polling being able to turn that into an event stream client app can setup a listener for.

In terms of historical data - if someone does need that - it seems there's an offset parameter:

https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=10000&limit=50 https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=100000&limit=50 https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=1000000&limit=50 (no data)

So it seems to not go back over 1M records.

so where does the data start exactly? it's not clear but around 850k back:

https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=875000&limit=50 "parent_burn_block_time_iso": "2021-01-21T18:10:48.000Z",

Given today is blockstack's birthday, then maybe mainnet came online around Jan 2021?

Are you saying there's other data beyond that point that the current API doesn't provide? it doesn't seem so?

I could boot an instance and just keep polling back into history. I'd probably only do that once and then archive it all as a mongo backup somewhere, so future instances can just clone that data and load it up. or perhaps we make a replica set available for others to setup a direct mongo clone.

since mongo is disk based (unlike redis) it seems we can just keep going back as far as needed.

If we wanted to get into more complex queries / filters then indexes might become an issue depending how much data there really is there but most of those historical type queries would more likely be for stuff like metrics or data investigation so the aggregation pipeline is a better fit.

Are you working on these APIs? What are your thoughts? Do you think there is demand for historical API data?

will-corcoran commented 2 years ago

Hey @dcsan - this grant is APPROVED! We look forward to seeing what you produce and can't wait for others in the community to have this at their disposal.

stx-grant-bot[bot] commented 2 years ago

Congratulations. Your grant is now approved. Please complete the on-boarding link here: https://stacks-grant.netlify.app/onboard?q=9179afe0ff5c29683776fbd46a762353

dcsan commented 2 years ago

Somehow I got stuck in the process for completing that form above.

but seems things are more urgent now, getting blocked on accessing stacks APIs for even like 1 req/second...

96|cards   | [--]    Error: Error calling read-only function. Response 429: Too Many Requests. Attempted to fetch https://stacks-node-api.mainnet.stacks.co/v2/contracts/call-read/SP2RJP81KF3V6NJVZEZ2SR8DD73VQJC98EJSTQWDV/dcards-v4/get-info and failed with the message: "<!DOCTYPE html>
96|cards   | <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
96|cards   | <!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
96|cards   | <!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
96|cards   | <!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
96|cards   | <head>
96|cards   | <title>Access denied | stacks-node-api.mainnet.stacks.co used Cloudflare to restrict access</title>
wileyj commented 2 years ago

hmm, it seems that a possible solution here would be to have a more generic firehose of events vs an API node taht is queried? there's not much you can do about Hiro's rate-limits (outside of asking them to adjust/accomodate).

zone117x commented 2 years ago

Note that the request you mentioned /v2/contracts/call-read/... is an RPC endpoint served by a full Stack Node instance, not by the API. The Stacks API is not not capable of evaluating smart contract calls, and the proposal above does not sounds like it would be able to do so either.

We propose a mini-node that will use public APIs to keep a synced copy of the blockchain events in a local DB, and a JS/TS API to query this DB. This does not include a full-running local blockchain node. That is the part of the iceberg below the waterline. We just provide higher level access to the transactions data and state.

wileyj commented 2 years ago

Note that the request you mentioned /v2/contracts/call-read/... is an RPC endpoint served by a full Stack Node instance, not by the API. The Stacks API is not not capable of evaluating smart contract calls, and the proposal above does not sounds like it would be able to do so either.

We propose a mini-node that will use public APIs to keep a synced copy of the blockchain events in a local DB, and a JS/TS API to query this DB. This does not include a full-running local blockchain node. That is the part of the iceberg below the waterline. We just provide higher level access to the transactions data and state.

good catch, i was distracted by the 429 and missed that

dcsan commented 2 years ago

yes you're right the rate limit above was on contract function calls, so would be outside the scope proposed for mininode

mininode is more like a queryable caching layer for the contract events firehose API, like database mirror of the blockchain state - but without running a whole local node. I'm reconsidering using postgres for this project so we can just use the existing schemas.

So I was planning to just poll the (hiro|syvita|xxx) RPC API but store the results locally so I can make a direct queries against the last known chain state, esp since blocks just come in every XX mins.

Another way to get to this data would be if the Hiro servers - or anyone else running a node - could publish a postgres replication feed, or otherwise enable others to get access to the raw PG database underneath. Then others can build their own complex queries and not be limited to the public APIs which are:

Any thoughts on that? I'm not an expert on different ways to provide (near) realtime read-only postgres updates but I would imagine there's a pg saas that would provide that. Eg Elephant provides replication across clouds.

This project goes into various ins and outs of setting up a fully realtime events API based on a PG database but is much more complicated than what I was proposing. https://github.com/supabase/realtime

other notes https://wiki.postgresql.org/wiki/Streaming_Replication https://www.postgresql.org/docs/current/runtime-config-replication.html

will-corcoran commented 2 years ago

hey @dcsan can you provide an update on this grant? were you able to sign the contract and onboard ok? cc: @jhammond2012

dcsan commented 2 years ago

hi @will-at-stacks no, i wasn't able to complete the forms in the end, it was asking me for a document i didn't have. I'll DM you with more info.

will-corcoran commented 2 years ago

@dcsan can you please email shakti@stacks.org - he will be able to assist you. we are excited to get this grant funded and in the wild! ;)

will-corcoran commented 2 years ago

Hello and thank you for participating in the Stacks Foundation Grants Program!

We are in the process of migrating from GitHub to the new Grants Dashboard. In order to complete your grant, you will need to submit any remaining Progress Review and/or Final Review requests through the Dashboard in order to receive your remaining payments.

Lastly, please note we are marking this grant 'closed' on GitHub for organizational purposes, but it is still 'open' on the Grants Dashboard.

Thanks and we hope to continue to support your efforts with additional grants!

Best, Will