onflow / nft-catalog

https://www.flow-nft-catalog.com/
The Unlicense
35 stars 11 forks source link

Remove getCatalog because it is horrible for performance #135

Open janezpodhostnik opened 1 year ago

janezpodhostnik commented 1 year ago

every-time getCatalog is called this fetches the entire catalog from storage and copies it. This is terrible, especially since the catalog is growing.

If a specific element is needed just use the existing getCatalogEntry or the newly added mustGetCatalogEntry which just has the pre-check that was duplicated everywhere in one place.

Iteration over the entire catalog should only be used if really needed. In that case I added forEachCatalogKey method.

Be aware that there are examples and scripts that recommend just iterating one catalog.keys.slice (or equivalent). This is not a good recommendation! catalog.keys are not deterministically sorted! using this for paging will lead to bugs. A different mechanism should be put in place for paging.

Another performance change that I added to this is the switch from getNFTViewsFromCap to getNFTViewsFromIDs in cadence/scripts/get_nft_and_views_in_account.cdc. Because only views for a specific ID are needed, so no need getting all of them.

missing from the draft:

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
nft-catalog ✅ Ready (Inspect) Visit Preview Mar 10, 2023 at 0:54AM (UTC)
aishairzay commented 1 year ago

Please ignore the prior approval, that was an accidental click, and I can't figure out how to undo it :P

janezpodhostnik commented 1 year ago

Is there a way we can not remove getCatalog and still gain these performance benefits? Many scripts and other services are using this.

No, not really. getCatalog is inherently problematic. Anyone using it will run into problems sooner or later as the catalog grows, as just calling getCatalog() will run out of gas. Anyone using this should stop, and think of a different solution.

getCatalogEntry seems to be enough for many use cases already, and that should be ok no matter how much the catalog grows. On the other hand forEachCatalogKey is almost as bad as getCatalog() but at least it has an explicit name, so that it is clearer the iteration is going to happen over the entire catalog.

If iteration over the catalog is really needed for certain use cases, paging should be added, so that iteration can be done in batches. However that is easier said then done, because iterating over a dictionary with forEachKey or just by for key in dict.keys doesnt have a deterministic order, so you cant really page it. For paging to be possible, you you would also need to store an up to date array of the catalog entries next to the catalog dictionary. Having that, you can then update the forEachCatalogKey method to iterate deterministically, and thus can be paged.

bluesign commented 1 year ago

You can page catalog with slice actually. Adding item to catalog is a rare event. Order is deterministic until dictionary is modified.

we can use a lastmodified variable on catalog. Maybe pass it as a parameter etc. poor man’s cursor :)

janezpodhostnik commented 1 year ago

@bluesign is that how it works? Is that documented somewhere i couldn't find anything. maybe that is how it works currently and there were no guarantees given for that...

It would be great if there were some dictionary hash or something that you could pass while paging and if it changed, you would know the dictionary keys have changed.

bluesign commented 1 year ago

@janezpodhostnik yeah it is implementation detail ( atree ) , but considering deterministic output, I don't think the order can change on cadence side per execution ( unless storage changes / dictionary modified )

I suggested before object hash also, but it didn't go too far. ( it would be very nice to have a hash property for resources and structs ) It would be very nice usage to check what changed in the account with storage browsing also.

Unfortunately we will end up not usable state if we don't implement those primitives. ( paging, lazy arrays/dics etc) People will hammer ANs, ANs will put harsher limits to execution, nothing useful will be done via scripts. We already had this with events blocks range.