iotaledger / iri

IOTA Reference Implementation
Other
1.15k stars 370 forks source link

Improve Tip Selection Algorithm #828

Open jakubcech opened 6 years ago

jakubcech commented 6 years ago

Description

Briefly describe the feature you want.

Motivation

Explain why this feature is needed.

Requirements

Create a list of what you want this feature request to fulfill.

Open Questions (optional)

Anything you want to discuss.

Am I planning to do it myself with a PR?

Yes/No.

Johnny-Milkshakes commented 6 years ago

Limit minimum depth.

Description We might want to limit the minimum depth passed to getTransactionsToApprove to 2 or 3.

Motivation At least at the current stage of the tangles life cycle depth 1 doesn't seem to help validate enough transactions and might even hurt because of spammers sending 0i transactions at depth 1 rarely pick larger bundles with value since they take longer to PoW. Maybe if there was more time between milestones this would have a lesser affect. I also noticed this (picture below) on the spamnet when it first went live which may or may not be related, something similar but to a lesser degree happens on mainnet. tangleparts

Requirements If a user calls gTTA with a depth lower than the minimum threshold return an error along the lines of: Depth too low. Minimum depth is ... Or just silently default to the minimum allowed depth.

Open Questions Any info on that pic would be appreciated. Just because i find it interesting.

Am I planning to do it myself with a PR If I knew Java i would but i don't 😢... yet

mehranshakeri commented 5 years ago

Description Alternative or Improvement for Tip Selection Algorithm This proposal tries to add the concept of divergence among the tips to the tip selection algorithm to lead the tangle in a lean shape where the probability of sharing the same verified transactions for two selected tips is maximised. Desired result is a shorter verification time in the network.

Motivation One of the parameters affecting verification time is selecting two tips for the new transaction to verify. Next figure is usually used to explain how the Tangle works and all other concepts from tip selection to verification criteria are built on top of that. Theoretically everything makes sense.

Perfect expected Tangle Perfect expected Tangle

In practice however, the tangle looks like http://tangle.glumb.de/

This is partially the result of Markov Chain Monte Carlo (MCMC) algorithm without considering how the tangle might grow if we don’t take care of keeping it lean. This increases verification time as well when number of scattered tips raises. The perfect ideal tangle with shortest verification time will have one single tip, therefore tangle will look like a chain rather than a graph. But our tip selection algorithm could help us to move forward to a tangle like first picture. The problem is that MCMC is missing the concept of diverging tips and therefore can not prevent it. What we mean by divergence? Let’s call tips the children and the two transactions verified by a tip as parents of that tip and parents of parents are ancestors. Tips are sharing the same great ancestor anyway which is the genesis transaction. Now walking from tip to genesis block gives us the family tree of that tip. Let’s also call ancestors closer to the tips as “close ancestors” and ancestors closer to the genesis block as “far ancestors”. The more close ancestors two chosen tips share, the more converged they are. In contrast the less common close ancestors those two tips share, the more diverged they are. divergence

As you can see in above picture, tips 1 and 2 have more close ancestors in common and we call them converged tips but each of them are diverged from tip number 3. If our tip selection could be a bit more smarter to detect divergence, we could optimise it to close these gaps leading to a slimmer tangle and shorter verification time.

Maybe a good example is the special transaction known as milestone transaction created by coordinator. A single transaction verifying all previous ones or in another words, converging most of the tips at once. It also follows the expectations for MCMC “weighted random walks”.

Proposal This proposal tries to solve the tip selection by sticking to the golden rule of IOTA, picking only two other tips to verify. The bias on tip selection will aim tangle convergence. First we have to detect the divergence between tips. To detect the divergence we have to first define it more precisely. For any set of tips, divergence between every two selected tips can be defined as the number of different ancestors they have. Of course this calculation can go back till genesis transaction but is not practical. Let’s assume d as the depth we are willing to go deep in the tangle to count the different ancestors. d = 1 will chooses direct parents. d = 2 will includes also parents of parents and d = 4 can result maximum in 16 ancestor transactions per each tip to compare. This is the max number since some of the ancestors might share parents/ancestors as well. Two tips with higher number of different ancestors are considered as diverged tips and are more likely to harm the verification time. Thus better to select them to close the gap. About the lazy tips which tend to point to old transactions, first of all it will be their problem to catch up with the network. Still a very low probability can be set for them to be selected as well. How we detect lazy nodes? With a proper value for d which can be calculated by simulations on the mainnet for example. With a small enough d there is absolutely no common ancestor between lazy tips and the majority of the tips (if majority play normal). Another limitation would be high number of tips and running calculation for all the tips. This number could be limited to a subset of tips e.g. n tips. Even based on processing power this n can vary in different nodes. These n tips can be selected randomly in the beginning, two chosen tips will be removed from the set anyway. For the next round, tips with higher difference in ancestors (diverged tips) can be kept in the set and new tips from outside of the set can be selected randomly to give everyone an equal chance but keep focus on diverged ones. Actually if the memory is not an issue, the initial calculation can be done for as many tips as possible (picking a very large n ) and update the data structure with coming and leaving the tips.

Am I planning to do it myself with a PR? Maybe on weekends