pragmaxim-com / ergo-uexplorer

Supplementary ergo chain explorer/analyzer with Scala/ZIO
MIT License
15 stars 1 forks source link

Data model for lightweight mode on MvStorage #3

Open pragmaxim opened 1 year ago

pragmaxim commented 1 year ago

Schema for embedded database

Finding out if any box related data have been spent or not in query time puts huge pressure on DB => let's do that at indexing time so that queries are real-time !

Shared

headerIdsByHeight:  Map[Height, Set[HeaderId]] // more than one in case of a fork-in-progress
blockByHeaderId:    Map[HeaderId, Block] // arbitrary block data (depends on performance)

Unspent/NonEmpty

utxosByAddress:     Map[Address, Map[BoxId, Value]] // this would be a clone of Node's utxo state
addressByUtxo:      Map[BoxId, Address] // non-empty address by utxo

Spent

allBoxesByCustomAddress:     Map[Address, Set[BoxId]] // all boxes for a configured address by a dApp developer

There can be many of these indexes, please provide your suggestions and use-cases!

Facts to consider :

Eventually the Http API will allow for retrieving anything that can be put together from these persistent Maps.

arobsn commented 1 year ago

Shared

headerById:                       Map[HeaderId, Value] // Probably best to only keep n headers? 
                                                       // Most of dApps will only need the last 10 
                                                       // headers to be used as reduction context.
headerIdsByHeight:                Map[Height, Set[HeaderId]]

Unspent

boxById:                          Map[BoxId, Value]
boxIdsByContract:                 Map[ContractHex, Set[BoxId]]
boxIdsByContractTemplate:         Map[TemplateHex, Set[BoxId]] // Constant segregated contract template
boxIdsByCreationHeight:           Map[Height, Set[BoxId]
boxIdsByR4:                       Map[RegisterHex, Set[BoxId]] // Non-empty R4 register
boxIdsByR5:                       Map[RegisterHex, Set[BoxId]] // Non-empty R5 register
boxIdsByR6:                       Map[RegisterHex, Set[BoxId]] // Non-empty R6 register
boxIdsByR7:                       Map[RegisterHex, Set[BoxId]] // Non-empty R7 register
boxIdsByR8:                       Map[RegisterHex, Set[BoxId]] // Non-empty R8 register
boxIdsByR9:                       Map[RegisterHex, Set[BoxId]] // Non-empty R9 register
boxIdsByTokenId:                  Map[TokenId, Set[BoxId]]
boxIdsByTransactionId:            Map[TransactionId, Set[BoxId]]

Spent

Spent boxes needs all Unspent maps plus the following:

mintingBoxIdsByTokenId:           Map[TokenId, Set[BoxId]] // EIP-4 only considers one minting box 
                                                           // per token, but protocol allows multiple 
                                                           // boxes in the same minting transaction, 
                                                           // so best to follow the protocol.

Using hashes as contract and registers indexing keys

From storing efficiency point of view, it's better to use hashes instead of the content directly as indexing keys, BLAKE2b256 have 32 bytes against contracts and registers that can be as big as the maximum box size (4 KB) minus the required registers' size.

BLAKE hashing algorithm is know by its speed and security, and is extensively used on Ergo, however indexing times must be taken into consideration.

Updated - v1

pragmaxim commented 1 year ago

Copy/pasting some rest-endpoints from @arobsn

get  /blocks/{blockId}  // block metadata and statistics
get  /boxes/{state}/tokens/{tokenId}/
get  /boxes/{state}/{boxId}/
get  /boxes/{state}/addresses/{address}/
get  /boxes/{state}/addresses/{address}/tokens/{tokenId}/
get  /boxes/{state}/contracts/{contractHex}/
get  /boxes/{state}/contracts/{contractHex}/tokens/{tokenId}/
get  /boxes/{state}/contracts/hashes/{contractHashHex}/
get  /boxes/{state}/contracts/hashes/{contractHashHex}/tokens/{tokenId}/
get  /boxes/{state}/contracts/templates/{contractTemplateHex}/
get  /boxes/{state}/contracts/templates/{contractTemplateHex}/tokens/{tokenId}/
get  /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/
get  /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/?R4=deadbeef&R5=cafe
get  /boxes/{state}/contracts/templates/hashes/{contractTemplateHashHex}/tokens/{tokenId}/
post /boxes/query/

// state = spent | unspent

get /tokens/{tokenId}/
get /tokens/{tokenId}/minting-box/
pragmaxim commented 1 year ago

I keep uexplorer on Scala3, I'm currently spiking this on multiple tech stacks, started with Slick as I had experience with it, then Doobie, ended up with : zio-protoquill, zio-http, zio-json which are all production ready or very close to production ready ... Eventually zio-protoquill could be replaced with zio-sql which is currently in development.

There are 2 choices in the scala ecosystem when it comes to SQL : Typelevel stack and Zio stack ... My bet is on Zio as the Typelevel stack is not really united well. One needs to have at least 10 various dependencies to put a simple CRUD app together, whereas in Zio land, you are good to go with just : zio-protoquill, zio-http, zio-json. This pays off especially when using Scala3 as it is basically first-class citizen in Zio 2.0.