ton-society / grants-and-bounties

TON Foundation invites talent to imagine and realize projects that have the potential to integrate with the daily lives of users.
https://ton.org/grants
287 stars 130 forks source link

Library for handling blockchain data #35

Closed liketurbo closed 2 years ago

liketurbo commented 2 years ago

Summary

Library that would be essential for TON projects that need production ready solution to index TON blockchain.

Context

When it comes to working with data from the blockchain and you need a some way to store, parse, subscribe to a blocks, a transactions, a messages then your choices are pretty limited.

Here's the ways (at least the ones that I found):

  1. Start your own Liteserver
  2. Connect to one of the existing Liteservers
  3. Use one of the free services like TON HTTP API

The first option is not really an option for most of the developers. Cause you need to run a full node which you need to maintain and infrastructure costs are not cheap.
The second option's major drawback is that you need to trust the Liteserver you're connecting to. And service provider need to scale their infrastructure to support all the users.
And same goes for the third option. You need to trust the service you're using and service provider need to scale their infrastructure to support all the users.

That kind of problem faced by a team from Pagoda Indexer from NEAR ecosystem.

And they came up with a solution which involves AWS S3 and consists of a pair of libraries: NEAR Lake Indexer and NEAR Lake Framework.

NEAR Lake Indexer is a library that allows you to connect to the blockchain and store the data in AWS S3.
NEAR Lake Framework is a library companion to NEAR Lake Indexer that allows you to connect to the AWS S3 and parse the data.

What got me interested in this solution is that availability is handled by AWS S3 and with Requester Pays option cost would be fixated for the provider and end users by would pay by themselves for the data they're using.

Goals

Deliverables

Definition of Done

[] TON Lake Indexer, which meets the goal requirement and open source
[] TON Lake Framework (Rust version), which meets the goal requirement, open source and crate available on crates.io: Rust Packages Registry
[] TON Lake Framework (JS/TS version), which meets the goal requirement, open source and package available on npm

Reward

liketurbo commented 2 years ago

I'm ready to implement this project myself in 6 weeks after the project is approved.

Naltox commented 2 years ago

I think that for 90% of projects it's OK to go with HTTP API approach, besides that you can run your own copy of toncenter if public one is not enough for you.

The second option's major drawback is that you need to trust the Liteserver you're connecting to.

That's not true if you use tonlib it checks merkle proofs of responses from liteservers. However some other methods of interacting with litestervers such as ton-lite-client for Node.js is not checking proofs yet and it would be better to put effort on fixing that.

liketurbo commented 2 years ago

The second option's major drawback is that you need to trust the Liteserver you're connecting to.

@Naltox By that I meant the case when Liteserver goes down and your app depends on it.

liketurbo commented 2 years ago

...besides that you can run your own copy of toncenter if public one is not enough for you

@Naltox Yes you can, but that approach has some hardware requirements

liketurbo commented 2 years ago

I think that for 90% of projects it's OK to go with HTTP API approach...

I don't know how got 90%, but for my recent project it was not enough and I always hit the rate limit.

liketurbo commented 2 years ago

I think that for 90% of projects it's OK to go with HTTP API approach...

I don't know how got 90%, but for my recent project it was not enough and I always hit the rate limit.

I'd rather pay for AWS S3 and have production ready app, than handling rate limit (and issues that comes with that) and explaining to my client why the app has some certain limitations.

talkol commented 2 years ago

I think that the project definition is very unclear and assumes that we know what elements the NEAR Lake Indexer is indexing. We don't and I don't think NEAR is close enough to TON that the comparison is useful.

What I don’t understand is what sort of data is being indexed. Is it just taking existing data that TON full nodes are already indexing (like block headers) and making this available?

Or is it indexing new connections, for example which Jettons/NFT are held by which account? This is a much more difficult task since this data is not indexed "natively" by a full node and requires to parse all blocks during sync to create new indexes from scratch.

The former is easy, I think we have enough solutions in the ecosystem for this (toncenter api / tonhub v4 / raw adnl). The latter is harder but needs to specify exactly which new connections it plans to index. Would you be able to query all messages sent to a smart contract? Will you be able to query all messages sent by a specific address to a smart contract? Will you be able to filter message arguments? Will you be able to query all jettons a specific user holds? Will you be able to query all the holders of a specific jetton?

Each one of these queries is a totally different index. Will users be able to define their own queries or are these indexes hard-coded? Some of these will be very expensive to hold, who will pay for that? Where will the index run? Do end-users need to index the whole chain from scratch by themselves? or is there a service provider that ran the index for the benefit of everybody and allows people to use it?

Also for the second there are many solutions that already exist in the ecosystem, not necessarily all open source. TonApi has tons of indexes, Disintar have their own indexer now open source with Kafka, TonHub have their own closed source indexer, and getgems I assume are also indexing tons of extra stuff about NFTs

liketurbo commented 2 years ago

@talkol That's a lot of questions 😂
I think if you follow the links I provided you will find the answers to most of them if not to all of them. Even if it's a different ecosystem the conceptual idea is the same.
And please let me know - how do you think we can improve the project definition. I'm open to suggestions.

talkol commented 2 years ago

I think sending to external links that each have a few dozens pages of documentation inside is a bit of a hassle. Can you give a TLDR overview of how it works within the contents of this footstep? (1 pager) It will help readers like myself

liketurbo commented 2 years ago

@talkol Please, check out my overview draft. If it looks great for the first reader as yourself I'll add it to the footstep definition.

What I'm trying to do is to create a library that would allow to connect to the blockchain (Liteserver) and store the data in AWS S3. And create a library that would allow to connect to the AWS S3 and fetch the data.

We are storing raw data from the blockchain in AWS S3, presumably blocks of transactions, messages, etc. with TON Lake Indexer.
We are fetching raw data from AWS S3 with TON Lake Framework. And because you have access to the stream object you can fetch whole blockchain, parse it and build some kind of explorer. Or you can parse latest blocks and build some kind of subscriber upon it.

What's differ it from other similar projects like toncenter api / tonhub v4 / raw adnl is reliability, availability and objectively flexibility cause API will consist from a just one function - startStream which will return stream object and you can fetch/parse/store raw data how you want it.

And with Requester Pays option cost would be fixated for the provider and end users by would pay only for data they used. Also it creates possibility for TON Foundation to host TON Lake Indexer because the cost will not increase with the number of users and they only have to pay for data write.

liketurbo commented 2 years ago

Well, I guess this footstep it's not as relevant as I though it would be 🤷‍♂️ Closing it for now