onflow / cadence

Cadence, the resource-oriented smart contract programming language πŸƒβ€β™‚οΈ
https://developers.flow.com/cadence
Apache License 2.0
530 stars 141 forks source link

Storage Querying API #208

Closed cybercent closed 2 years ago

cybercent commented 4 years ago

Issue To Be Solved

A way to iterate over the storage to be able to list the resources owned by an account.

(Optional): Suggest A Solution

If there are multiple solutions, please present each one separately. Save comparisons for the very end.)

(Optional): Context

I’m working on a marketplace and I was looking for a way to list all NFTs owned by an account.

cybercent commented 3 years ago

Hi @turbolent is there an ETA for the Storage Querying API ?

turbolent commented 3 years ago

@cybercent There's no ETA, but it's definitely one of the things we'll work on soon, along with other storage API improvements like #369 and #376.

I've added a roadmap for Cadence yesterday, and improving the storage API is a high-priority item.

If you're interested in helping out, let me know!

10thfloor commented 3 years ago

Bump!

dryruner commented 3 years ago

Is this issue resolved? -> https://docs.onflow.org/faq/developers/#how-can-i-see-what-fungible-tokens-an-account-has

turbolent commented 3 years ago

@testrain No, this issue is still open. The documentation you linked requires knowledge about the fact that the account has certain values stored. The idea for this issue is to allow developers to query what storage paths store values, and what they are, without having to prior knowledge what is stored (i.e. what storage paths are used).

We're currently working on a new dictionary data structure and will eventually use is for account storage, effectively giving us the functionality to list storage paths which store values for free.

atapin commented 2 years ago

@turbolent any updates for this issue?

D10100111001 commented 2 years ago

This is the one thing that is highly painful and annoying when working with Flow. I think this should be prioritized. Any other blockchain allows any dev to generically get all NFTs for a given account. In the case of flow, because an NFT can be stored into any storage path, it is impossible to generically show all NFTs owned by the an account. Another problem that will need to be solved after the generic storage querying is figured out is generically loading metadata for the NFTs. The INFT interface should have the signature for a the getMetadata function so developers can generically pull it for any NFT in a collection.

dete commented 2 years ago

This is the one thing that is highly painful and annoying when working with Flow.

Agreed!

Any other blockchain allows any dev to generically get all NFTs for a given account.

Actually, I would disagree with this. Looking at Ethereum, it's easy to find out the list of NFTs of a specific type owned by an account, but that's also easy on Flow (by assuming the Collection is in the standard path). But on Ethereum, you have to know the contract address of all possible NFTs to know the list of all NFTs owned by a single account. That's barely tractable on Ethereum today, but not especially scalable for the future. Once Cadence has storage iteration, this will be something that is a MASSIVE improvement on Flow than on Ethereum.

This feature is taking a long time because it depends on a pretty significant rewrite of the underlying storage layer of Cadence. That rewrite is rolling out soon (code is complete and tested!), and so we hope to have storage iteration available to scripts and contracts very soon. Thanks to y'all for your patience!

cybercent commented 2 years ago

@dete thank you for the update! Thank you to the team for the continued effort you guys put in. Highly appreciated! πŸ––

turbolent commented 2 years ago

Once #1248 is merged we finally have the infrastructure in place to iterate over account storage. The next step would be to come up with an API.

Some challenges that need to be addressed:

The first point makes defining a "storage entry" type for both resource-kinded and value-kinded (e.g. structs) objects difficult. For resource-kinded values, the entry should probably provide an authorized reference (auth &AnyResource). For value-kinded objects it would be more ergonomic to provide the value itself, instead of a reference to it (AnyStruct).

The second point makes adding a field let storageEntries: [StorageEntry] inefficient: If there are many objects in storage, then a large array would have to be temporarily allocated. A more efficient alternative could be an iterator.

Also, it might make sense to provide the run-time type of the object in the storage entry (e.g. a field let type: Type)

turbolent commented 2 years ago

Maybe it is enough if the storage entries only contain the path and the type, and do not contain a value – we already have the storage API which can be used to load/copy/borrow the value for a given path.

struct StorageEntry {
  let path: StoragePath
  let type: Type
}

let account = getAccount(0x1)
account.forEachStored(fun (entry: StorageEntry): Bool {
  if entry.type == Type<FlowToken.Vault>() {
    log(entry.path)
    // stop iteration
    return false    
  }

  // continue iteration
  return true
})

Or without introducing a new type, which is not very useful on its own and making the information parameters of the iteration function:

let account = getAccount(0x1)
account.forEachStored(fun (path: StoragePath, type: Type): Bool {
  if type == Type<FlowToken.Vault>() {
    log(path)
    // stop iteration
    return false    
  }

  // continue iteration
  return true
})
turbolent commented 2 years ago

In addition to iteration over the account storage, we can also add iteration over public and private links/capabilities, and the question is also, what should the API look like

bluesign commented 2 years ago

I think iteration is nice for sure, but also additionally some kind of query functionality can be useful. ( to limit search area and return array of struct )

If we will use the struct below.

struct StorageEntry {
  let path: StoragePath
  let type: Type
}

I think we can also use query based ( returning [StorageEntry] ) functions.

Something like account.storageManager can be useful. ( later even maybe this can be used to move storage setup for DApps into contracts, without passing AuthAccount but passing storageManager )

For example:

account.storageManager.query(_ type: Type) [StorageEntry]
account.storageManager.queryFirst(_ type: Type) StorageEntry?

or if iterator:

account.storageManager.query(_ type: Type) StorageEntryIterator

for storageEntry in account.storageManager.query(Type<Topshot.Collection>()){
}

Also I think we can extend StorageEntry with load and borrow like below, so we can slowly get rid of Paths little bit.

struct StorageEntry {
  let path: StoragePath
  let type: Type
  fun borrow()
  fun load()
}

If we cover also links and capabilities with StorageEntry, can be useful I guess.

Other then query here also:

I am a bit torn between if needed to add some helper booleans like canBorrow canLoad etc. ( like capability check method )

But by moving a lot of functionality into StorageEntry can make it really powerful.

Also something like queryAndBorrowFirst can be useful.

Some typical Transaction can look like this:

let provider = acct.strorageManager.queryAndBorrowFirst(Type<DapperUtilityCoin.Vault{FungibleToken.Provider}>()) 
    ?? panic("Could not borrow a reference to the buyers FlowToken Vault")
self.purchaseTokens <- provider.withdraw(amount: UFix64(6)) as! @DapperUtilityCoin.Vault
bjartek commented 2 years ago

I think it would be very nice to now only search in storage but also in public paths. Storage are right now not available from scripts, so the usefullness of that is limited IMHO.

Questions I want to have answered is:

I really like @bluesign api from above. But for me allowing it to also search in what you can get a capability to this resource as would be awesome.

It would also be useful if we could combine this with the Metadata standard and be able to say. Give me all resources that have this View. Or even "give me all items in storage" that have this view, but that is a little bit heavy maybe?

bluesign commented 2 years ago

I think best option is to expose storageManager ( or whatever we call it ) to both PublicAccount and AuthAccount. ( Immutability can work here very good )

It would also be useful if we could combine this with the Metadata standard and be able to say. Give me all resources that have this View. Or even "give me all items in storage" that have this view, but that is a little bit heavy maybe?

This would be very nice +1

Actually we can maybe introduce some Filter function to iterator/array etc, which will get a Cadence Any, then you will return True/False for some condition etc.

bjartek commented 2 years ago

Are there any concrete plans for when this will be done? It is very important imho.

bjartek commented 2 years ago

Are there any plans for this to span addresses? One other common Β«issueΒ» on flow is get a listing of all nfts of a type across addresses.

Or even get a list of all account with the given thing stored/linked.

turbolent commented 2 years ago

Thank you for the great ideas!

I agree, a type-based storage querying API would be great to have. Unfortunately it is currently not possible, as the FVM / state trie does not provide an index of type to storage locations.

As for iteration to other domains, like public and private, in addition to storage: That would be possible and sounds like a good idea πŸ‘

bluesign commented 2 years ago

@turbolent wouldn't it possible to handle this on cadence side ? Eventually it will be just a dictionary, {String : [Path]} where key is a type identifier. I don't think we need ability to deep query there.

turbolent commented 2 years ago

@bluesign Yes, if we wanted to limit it to just root/top-level values, we could maintain the index from the Cadence side. I wonder though how useful it would be, as some developers may assume it returns all values of the type.

We had also discussed exposing the value traversal functionality that already exists in the implementation to Cadence, so one could use the querying function and the traversal function to find all values of a type (though it would be very expensive, as all intermediate data would need to get loaded)

bluesign commented 2 years ago

@turbolent this is very good question actually.

In the context of containers, multi level storage, I see those options:

For composite resources, I think it is not not only feasible, but also not possible to query their members reliably.

I think even NFTCollection should not be queried for ownedNFTs. (Even id is key there we would need to use uuid which will not provide info)

For array and dictionaries: I think here is also little tricky, It would need some addressing like mentioned before, but it would be expensive.

On the other hand, I don't see much usage of dictionary/array in storage, they are rarely used in top level storage.

For me it seems like acceptable trade off to query top level by type, and possibly get rid of Paths a bit.

But more importantly, if we make Query storage by Type for collections and vaults, it would be super easy for wallets to say something like : 'this transaction will access your FlowTokenVault' etc so basic human readable transactions can be a thing in the future without simulating the transaction.

turbolent commented 2 years ago

I totally agree that it would be nice to add a type-based querying API eventually, but I would also like to see an iteration querying API before that, as it could be implemented now.

What do people think about the proposal above (https://github.com/onflow/cadence/issues/208#issuecomment-983086381)? Or maybe someone has another good idea for an iteration API?

bjartek commented 2 years ago

I think that proposal is a good place to start. Then you can build higher level abstractions on top of it later.

sideninja commented 2 years ago

I totally agree that it would be nice to add a type-based querying API eventually, but I would also like to see an iteration querying API before that, as it could be implemented now.

What do people think about the proposal above (#208 (comment))? Or maybe someone has another good idea for an iteration API?

I like the second API example.

turbolent commented 2 years ago

Discussed iteration / pagingation with @fxamacker and @ramtinms.

It won't be possible to create a "cursor" that will allow starting iteration over an atree data structure (e.g. ordered map, as used for account storage) in one script or transaction, then continue iteration in a subsequent script transaction against the state the cursor was created for.

If we loosen the requirement to effectively query previous state and always query the current state, the next problem are mutations to the container: If the container is mutated, e.g. a new key is inserted into an ordered map, then the iteration order changes, invalidating the cursor and leading to the iterator e.g. not reporting certain elements or reporting them multiple times.

If we loosen the requirement further and assume that the iterated container is not mutated, then @fxamacker thinks it might be possible to provide a cursor that can be serialized, so e.g. it can be returned from a script, then provided to a subsequent script, or it could be stored on-chain.

To help with the limitation of mutations invalidating the cursor, we could ensure the iterated data structure is not mutated, for example by:

bluesign commented 2 years ago

I think solution here is detecting mutation ( not avoiding ). Technically nothing can iterate on mutating object. So mutation should just invalidate the cursor. I don't know much about atree structure, but probably ordered map can use some state identifier + key as cursor.

But if you limit discussion to path keys only. As we have ordered map, seek with prefix can work too. with seekWithPrefix and next we get rid of cursor totally, mutation responsibility falls on user of the api. But I still think state identifier is very nice idea in my opinion, I think it can be useful in other areas too. [0]

[0] https://github.com/onflow/cadence/issues/1777

turbolent commented 2 years ago

This should probably implemented in three stages of functionality:

  1. Add support for retrieving all paths of an account, e.g. a computed field Auth/PublicAccount.paths: [Path]. This would not support pagination, but would be useful as an MVP.
  2. Add support for pagination, without handling mutation. This could be a function-based API, like above, or reified as an object as an iterator/enumerator/cursors.
  3. Add support for handling mutation during mutation, e.g. by locking or detecting mutation as described above

As for the implementation:

turbolent commented 2 years ago

@janezpodhostnik Do you think we will need to add dedicated metering for this?

SupunS commented 2 years ago

After seeing the implementation I have a suggestion for a slightly modified version of the proposed API:

Cross-posting from https://github.com/onflow/cadence/pull/1845#discussion_r933515481:

Would it be better to group these 'paths' related operations (this one and the iterator functions) together? I am thinking of something similar to keys field (of AuthAccount).

e.g:

let paths: AuthAccount.Paths

and AuthAccount.Paths to be something like:

struct Paths {
   pub allPaths: [Path]

   pub fun forEachPublic(_ function: ((PublicPath, Type): Bool))

   pub fun forEachPrivate(_ function: ((PrivatePath, Type): Bool))

   pub fun forEachStored(_ function: ((StoragePath, Type): Bool))
}

The advantage is:

  • It groups similar operations together
  • The usage is more verbose (specially for the iterator functions):
    account.Paths.forEachPublic(...)
bluesign commented 2 years ago

@SupunS good idea, maybe we can do something like:

With built in enum:

enum StorageDomain{
   ANY
   PUBLIC
   PRIVATE
   STORAGE
}

and

in AuthAccount:

   pub fun getPaths(_ StorageDomain): [Path]
   pub fun forEachPath(_ StorageDomain, _ function: ((Path, Type): Bool))

Do we have any benefit from getting PublicPath, StoragePath instead of Path ?

turbolent commented 2 years ago

1845 and #1851 have landed and will likely be deployed in the following spork

dsainati1 commented 2 years ago

The plan is to have iteration abort execution after storage is modified (either by saving or removing from it). However, we want people to be able to search storage for something and then perform one modification; e.g. searching for a specific capability in storage and then unlinking it. So instead of panicking immediately on the mutation, we will instead panic on the next invocation of the callback after the mutation is performed. This way if users return false from the callback, stopping iteration, after they do a write, then this use case is still supported.

cybercent commented 2 years ago

πŸ™πŸ» I appreciate all your efforts in bringing this feature to life.

bluesign commented 2 years ago

@dsainati1 is it possible to make this panic free by making iterator return Bool instead of Void ? Something like iteration completed or not response.

dsainati1 commented 2 years ago

I would worry about people not properly checking the return value of the iterator and thinking that they properly iterated over everything in storage when they actually skipped a large number of elements. Continuing to iterate after storage is modified is significantly bad enough that I feel like a panic is appropriate.

bluesign commented 2 years ago

I feel like this is very confusing API as is:

turbolent commented 1 year ago

Thank you everyone for contributing your feedback. The next release is going to ship the first feature related to storage querying: storage iteration.

This is just the first step and there have been discussions around other kinds of querying APIs above (e.g. by type). Please feel free to open a new feature request for those specifically.