dcsan commented 2 years ago

Background

Attacking Stacks scaling problems with a decentralized and simple "mini node" approach

Problems with developing dApps on Stacks:

Scaling problems
The StacksJS API lacks key features, and is difficult to host or hack on
Lack of third party infrastructure providers such as Alchemy/Infura/TheGraph in Eth-land

We propose to solve these problems for a subset of ReadOnly queries:

provide a stacks 'mininode' that syncs with the main blockchain
A simple enough codebase that any JS dev can clone and set it up locally
it will provide a Read Only API to the synced blockchain data
Keep the codebase simple enough so that people can send PRs for specific query features
If successful, companies could host a version or extend to make infura/alchemy like services

We propose a mini-node that will use public APIs to keep a synced copy of the blockchain events in a local DB, and a JS/TS API to query this DB.

This does not include a full-running local blockchain node. That is the part of the iceberg below the waterline. We just provide higher level access to the transactions data and state.

We propose using accessible technologies and keeping the code-base easy to contribute to, and instantly useful to the non-core dev teams building NFT and other projects. If this is a fast enough clone it would be trivial for each project to have their own copy of the confirmed transaction history and build out the APIs they need without relying on the hiro team.

Project Overview: 'MiniNode'

Running a full local node is possible, but a huge amount of devops work to just build your own dApp.

For most apps, we don't need to run such a node, we just need to be able to query the data.

The mininode will use the 'firehose' API:

https://stacks-node-api.mainnet.stacks.co/extended/v1/tx

And poll for new transactions.

It will just store these raw TXs into a mongoDB collection.

Then we can make more granular queries into the DB for all the data such as "TXs that contain a specific token ID" or "TXs per contract"

Event based "subscriptions"

Additionally any 'new' transactions found on each poll of the firehose will go into a separate capped collection.

This allows us to a Tailable Cursor, to call our dApp when new data is received. We can use standard JS event emitters to provide a pub/sub or sockets API to calback our dApp only when there are relevant events. Of course a fallback query can also be done if the listener stopped listening.

This would massively lessen strain on the main stacks API nodes, as there would be no need for dApps to keep polling for changes.

Additionally we can setup granular filters as Mongo queries, eg just listen for new TXs on a specific contract or wallet.

Data storage

While mongo has some shortcomings for complex relational queries, it's well suited for storing the blockchain transaction log.

Additionally we could destructure the data somewhat, separating the incomning firehose TX log into queryable Contracts / Tokens / Users collections.

Query structure

We'll provide a number of basic REST endpoints for queries we cannot currently get from Stacks JS API

Transactions on a contract
Owners list for an NFT token

Analytics

We could also add some simple analytics, eg if people want to see how many transactions are being made on a specific contract. MongoDB provides a decent high-performance aggregation pipeline for these type of queries.

Scope

The project is deliberately limited to NOT become a sprawling megaproject.

poll the official Stacks API for new transactions
store those in a local mongoDB
clients can query the local DB directly
or subscribe to events for changes they're interested in

We will provide a basic client UI to allow you to pass parameters to pre-built queries, but not a full-blown web UI for building these queries.

Budget and milestones

What grant amount are you seeking? How long will the project take in hours? If more than 20, please break down the project into milestones, with a clear output (e.g., low-fi mockup, MVP with two features) and include the estimated work hours for each milestone.

Total Grant Request: 5000 STX

M1: ~3 weeks

detailed tech spec
schema documents
1 week period for community review
1000 STX

M2: ~3 weeks

polling API
data schema (contracts/tokens/wallets)
storing in mongoDB
querying local DB
docs on how to set this up locally
hosted 'staging' public deploy of the service
3000 STX

M2: ~3 weeks

event based pub/sub API
tailable cursor collection
Overview UI to see the status
UI to pass params to preset queries
possible example of an analytics query (stretch)
2000 STX

Team

DCsan will provide the architecture and DB schemas, data structures, based on other existing projects.

For the main work there is not much stacks specific knowhow needed: It's polling an API, storing the documents and making a query UI, like normal a web API. It might be a good opportunity to bring a skilled web2 developer into the stacks ecosystem.

There are a number of people across different stacks NFT projects who've expressed interest and might fit the bill.

Risks

duplication of effort We could be 'front running' some features that will eventually make it into the main Hiro API? But we can provide a quick testing ground and see what people are hacking out, then Hiro can consider if they want to add those features. Or provide some friendly competition with a much simpler stack to validate ideas and increase overall velocity and adoption. A 'stacks lite' dev environment to test ideas out.
not handing "write" transactions It's out of scope to handle writing, since there is no real scaling benefit. Those need to be pushed to the mainnet anyway.

This could be in the long-term future considered as type of "optimistic" transactions where we just write locally so a UI can show it's a write-request, and then wait for a confirmation but that's far beyond scope of what we need for pain points today.

But this kind of work on scaling the stacks ecosystem does play into 2022 scaling topics for Stacks like appChains

Roadmap and future extensions

support microblocks
open-source analytics using MongoDB aggregation pipeline and JS charts
Discord or Twitter bot - for example to tweet out when sales happen on a contract
NPM module for the API to make it easy to query to public deployed instances
Docker setup for people who want to run it locally with minimal setup
rely on the oplog replication protocol or change streams to automate syncing clusters and DBs across regions / locations / installations
'state' concept we could provide a more intelligent processing of the data eg to make tracking all the owners of a token contained in a "Contract' db collection.
calling blockchain Read only functions for syncing state some internal data, eg for variables contained within the blockchain might need to call read-only functions on the blockchain itself. These are (recently) very slow - from 5s to a couple of minutes. In our (@EscapeLabs) current apps we do some of these calls to sync internal data, but we could consider an effective cache for this data here too.

Fully Open Source

This project will be developed fully in public open-source repositories.

Community and supporting materials

The need for this project arose while developing a couple of other Stacks ecosystem projects.

LayerC This is a bot to verify and track token ownership inside a discord community. We let users type /verify to login with their stacks wallet, and prove they own a token. The problem is that we have to scrape the user's entire wallet, for ALL tokens. Additionally we need to then somehow watch for changes on a range of tokens to see if they keep holding it. When you have a community with 10K users, its' very hard to continually poll the blockchain for updates. https://stan.escapelabs.xyz/setup

We've already used this bot for CrashPunks launch and Project Indigo for wallet verification, with more features coming in early 2022. This is the main focus for myself @dcsan but i need the infrastructure features here to support more realtime updates, and so I thought about a more general and open sourced solution that would be useful to others.

dCards This project allows you to send an NFT token to another user as a Gift. For this we need to track if the other user has "claimed" the token. So we need to listen for events on a specific contract.

Submitted by

David 'DC' Collier

stx-grant-bot[bot] commented 2 years ago

Thanks for submitting a grant application! Due to the holiday break, grant reviews are currently on hiatus and will resume on January 5th. In the meantime, we encourage you to continue to share and get feedback on your application from the community.

dcsan commented 2 years ago

Just to play devil's advocate some other approaches:

Just deal with standing up stacks nodes everywhere for local dev maybe:

stacks-local-dev
bitcoin knot

for staging / production etc. ... maybe someone [ #250 ] else could do that and provide a postGres replication feed to just pull the data into a queryable PG database so others can build a functional API on top of it? Though in that case, why wouldn't you just wade into stacksJS i guess. Which brings us back to square one.

This approach also looks interesting: https://twitter.com/leopradel/status/1475513460431667202 although I guess it's not running now: https://graphql-api.stxstats.co/v1beta1/relay

I'm not sure how well arbitrary graphQL queries would scale, but perhaps someone could be enticed/funded to provide TheGraph type SaaS on top of that?

also it seems like the latency and errors of the existing stacks API could be worked around with some devops. Syvita crew just deployed some nodes across a few AWS regions and got better stability. https://twitter.com/mrkmcknz/status/1476232724981727241 ofc that doesn't fix the problems that the APIs lack features such as any meaningful way to filter or query.

Overall the current firehose API seems like not that daunting in terms of data It's a known surface area Sucking the data into a mongo instance would make it easier to work with, removing the need to run a node(s) Decouples development from the complexity of the whole stacksJS API system Allows devs to go faster and write queries for just the data they need.

wileyj commented 2 years ago

How will this proposal deal with historical tx data that's not exposed by the /v1/extended/tx endpoint?

dcsan commented 2 years ago

@wileyj mine (and others i'm talking to) main need for this project is to track changes to contracts. eg events where an NFT is transferred from one wallet to another. But rather than constant polling being able to turn that into an event stream client app can setup a listener for.

In terms of historical data - if someone does need that - it seems there's an offset parameter:

https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=10000&limit=50 https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=100000&limit=50 https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=1000000&limit=50 (no data)

So it seems to not go back over 1M records.

so where does the data start exactly? it's not clear but around 850k back:

https://stacks-node-api.mainnet.stacks.co/extended/v1/tx?offset=875000&limit=50 "parent_burn_block_time_iso": "2021-01-21T18:10:48.000Z",

Given today is blockstack's birthday, then maybe mainnet came online around Jan 2021?

Are you saying there's other data beyond that point that the current API doesn't provide? it doesn't seem so?

I could boot an instance and just keep polling back into history. I'd probably only do that once and then archive it all as a mongo backup somewhere, so future instances can just clone that data and load it up. or perhaps we make a replica set available for others to setup a direct mongo clone.

since mongo is disk based (unlike redis) it seems we can just keep going back as far as needed.

If we wanted to get into more complex queries / filters then indexes might become an issue depending how much data there really is there but most of those historical type queries would more likely be for stuff like metrics or data investigation so the aggregation pipeline is a better fit.

Are you working on these APIs? What are your thoughts? Do you think there is demand for historical API data?

will-corcoran commented 2 years ago

Hey @dcsan - this grant is APPROVED! We look forward to seeing what you produce and can't wait for others in the community to have this at their disposal.

stx-grant-bot[bot] commented 2 years ago

Congratulations. Your grant is now approved. Please complete the on-boarding link here: https://stacks-grant.netlify.app/onboard?q=9179afe0ff5c29683776fbd46a762353

dcsan commented 2 years ago

Somehow I got stuck in the process for completing that form above.

but seems things are more urgent now, getting blocked on accessing stacks APIs for even like 1 req/second...

96|cards   | [--]    Error: Error calling read-only function. Response 429: Too Many Requests. Attempted to fetch https://stacks-node-api.mainnet.stacks.co/v2/contracts/call-read/SP2RJP81KF3V6NJVZEZ2SR8DD73VQJC98EJSTQWDV/dcards-v4/get-info and failed with the message: "<!DOCTYPE html>
96|cards   | <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
96|cards   | <!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
96|cards   | <!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
96|cards   | <!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
96|cards   | <head>
96|cards   | <title>Access denied | stacks-node-api.mainnet.stacks.co used Cloudflare to restrict access</title>

wileyj commented 2 years ago

hmm, it seems that a possible solution here would be to have a more generic firehose of events vs an API node taht is queried? there's not much you can do about Hiro's rate-limits (outside of asking them to adjust/accomodate).

zone117x commented 2 years ago

Note that the request you mentioned /v2/contracts/call-read/... is an RPC endpoint served by a full Stack Node instance, not by the API. The Stacks API is not not capable of evaluating smart contract calls, and the proposal above does not sounds like it would be able to do so either.

We propose a mini-node that will use public APIs to keep a synced copy of the blockchain events in a local DB, and a JS/TS API to query this DB. This does not include a full-running local blockchain node. That is the part of the iceberg below the waterline. We just provide higher level access to the transactions data and state.

wileyj commented 2 years ago

Note that the request you mentioned /v2/contracts/call-read/... is an RPC endpoint served by a full Stack Node instance, not by the API. The Stacks API is not not capable of evaluating smart contract calls, and the proposal above does not sounds like it would be able to do so either.

We propose a mini-node that will use public APIs to keep a synced copy of the blockchain events in a local DB, and a JS/TS API to query this DB. This does not include a full-running local blockchain node. That is the part of the iceberg below the waterline. We just provide higher level access to the transactions data and state.

good catch, i was distracted by the 429 and missed that

dcsan commented 2 years ago

yes you're right the rate limit above was on contract function calls, so would be outside the scope proposed for mininode

mininode is more like a queryable caching layer for the contract events firehose API, like database mirror of the blockchain state - but without running a whole local node. I'm reconsidering using postgres for this project so we can just use the existing schemas.

So I was planning to just poll the (hiro|syvita|xxx) RPC API but store the results locally so I can make a direct queries against the last known chain state, esp since blocks just come in every XX mins.

Another way to get to this data would be if the Hiro servers - or anyone else running a node - could publish a postgres replication feed, or otherwise enable others to get access to the raw PG database underneath. Then others can build their own complex queries and not be limited to the public APIs which are:

very limited querying
slow since you're calling a remote public API
complex to modify the codebase since it's all mingled with the node operations and various layers of that codebase

Any thoughts on that? I'm not an expert on different ways to provide (near) realtime read-only postgres updates but I would imagine there's a pg saas that would provide that. Eg Elephant provides replication across clouds.

This project goes into various ins and outs of setting up a fully realtime events API based on a PG database but is much more complicated than what I was proposing. https://github.com/supabase/realtime

other notes https://wiki.postgresql.org/wiki/Streaming_Replication https://www.postgresql.org/docs/current/runtime-config-replication.html

will-corcoran commented 2 years ago

hey @dcsan can you provide an update on this grant? were you able to sign the contract and onboard ok? cc: @jhammond2012

dcsan commented 2 years ago

hi @will-at-stacks no, i wasn't able to complete the forms in the end, it was asking me for a document i didn't have. I'll DM you with more info.

will-corcoran commented 2 years ago

@dcsan can you please email shakti@stacks.org - he will be able to assist you. we are excited to get this grant funded and in the wild! ;)

will-corcoran commented 2 years ago

Hello and thank you for participating in the Stacks Foundation Grants Program!

We are in the process of migrating from GitHub to the new Grants Dashboard. In order to complete your grant, you will need to submit any remaining Progress Review and/or Final Review requests through the Dashboard in order to receive your remaining payments.

If you have questions about the Grants Dashboard in general please email ambassadors@stacks.org.
If you have questions regarding payments and/or contracts please email payments@stacks.org.

Lastly, please note we are marking this grant 'closed' on GitHub for organizational purposes, but it is still 'open' on the Grants Dashboard.

Thanks and we hope to continue to support your efforts with additional grants!

Best, Will

stacksgov / grants-program

MiniNode - synced local copy of the tx log with granular query API and event emitters #251

Background

Project Overview: 'MiniNode'

Event based "subscriptions"

Data storage

Query structure

Analytics

Scope

Budget and milestones

Team

Risks

Roadmap and future extensions

Fully Open Source

Community and supporting materials

Submitted by