TokenList data structure to reduce Universe size by 75%

Qsilver97 commented 5 months ago

The Universe file has low entropy due to its design, by splitting the Universe into two parts: TokenList and Universe2 the capacity of Universe file increases 4x

This is because each entry of the Universe file is 48 bytes, 40 of which are the (name + pubkey), but this can be encoded into 32 bits as follows:

For SC shares just use the contract index directly For user created tokens, set bit 31 and assign a TokenIndex in order of creation tick, alphabetical order if more than one token created in the same tick.

This reduces the total number of SC from 4 billion to 2 billion, which seems acceptable.

With this change each Universe2 entry would be 12 bytes instead of 48 bytes. Currently there are about 10 total tokens (SC shares + user defined) so the total size of the TokenList would be less than 512 bytes! A very small cost to expand the Universe capacity 4x

It also solves the issue of quickly finding all existing tokens by having a network request that just returns the TokenList, as the number of tokens increases, we might want to add "pagination" to the return to keep the max size of the results reasonable.

As it is our AIRDROP SC is on hold as after 42 airdrops the Universe file is totally full. By my analysis the Universe data is the lowest entropy (by far) in Qubic and thus it is the weakest link.

NewToken(name,pubkey) -> returns token index in order of creation would be called during tick processing that detects a new user defined asset being issued.

convenience function: TokenIndex(name,pubkey) would return the 32 bit index and could handle both SC and user tokens

GetTokenList would be a network command that returns the full TokenList

Universe2 would use the token index in place of (name + pubkey)

Above are just one example of how it could be implemented efficiently. Alternative would be to just store all the asset creation Universe entries directly in the TokenList with the added field of TokenIndex. That would probably minimize the code changes needed.

philippwerner commented 5 months ago

This is what CFB said in the core dev chat: "It doesn't make much sense because entropy will be high. Bitcoin, Ethereum and others demonstrated that popular ledgers are not compressible much because addresses are high entropy and don't repeat often enough." We all agree that GetTokenList is needed, so https://github.com/qubic/core/issues/109 should be solved, but CFB is not convinced that splitting the universe is the good way to implement it.

Qsilver97 commented 5 months ago

qubic is different from BTC and ETH and while most of the qubic data is high entropy the universe file is provably low entropy. thus a counterexample that negates whatever the typical crypto ledger entropy is.

The fact that I can compress it 75% with 512 bytes of data shows it is VERY low entropy

If you can get it so I cannot compress universe by 75% via other means, that would show it has been fixed but as long as the name and pubkey of issuer are repeated over and over, it will remain low entropy

qubic / core

TokenList data structure to reduce Universe size by 75% #118