iotaledger / iri

IOTA Reference Implementation
Other
1.16k stars 373 forks source link

IRI and network behavior under high TPS #1031

Open ghost opened 5 years ago

ghost commented 5 years ago

Bug description

Tested with 'good' spam over ~50 public nodes:

When TPS is higher than approximately 5TPS, (especially slower) IRI nodes starting to become out-of-sync.

Actual behaviour

Tangle and IRI nodes behave worse under high TPS. The higher the TPS, the more worse the confirmation rate, and the longer the confirmation time becomes. IRI nodes getting out-of-sync, and tip pool becomes 10000 under 'too high TPS'.

Expected behaviour

Errors

"Skipping negative value for address ..." errors for slower nodes. Corrupt DB.

gjeee commented 5 years ago

tip-pool has a fixed capacity of 10K so what you describe is impossible. which version of IRI do you use? can you describe your test-approach in more detail? do you have some numerical test-results?

ghost commented 5 years ago

The used public full nodes were all version 1.5.4. Yes, I mean tip pool becomes maximum filled with 10k tips. Which should not be a problem. Just an observation.

Test it yourself: spam the network a little bit to like 6 TPS and then 10 TPS (preferably using different nodes all over the world), and you'll see confirmation rate and times become worse. Not only for the spam tx's, but for all tx's in mainnet (including value transactions).

I do not have numerical test results, it just describes my observations. Probably related to https://github.com/iotaledger/iri/issues/901 , https://github.com/iotaledger/iri/issues/891 and some other issues.

gjeee commented 5 years ago

v1.5.5 has some new commits which might solve this....but we have to wait and see. The current version relies on wrong/stale solidity info, so every function using isSolid() does the wrong things. The step direction of tip-selection e.g. relies on this call.

It might be also coo-related but this is pure speculation.

ghost commented 5 years ago

Ah ok. Interesting, hope that that will solve it. Final solution probably will be a combination of adjustments. It's a shame tangle can't handle more than ~6 TPS right now without hurting the confirmation rates and times. Currently network is being spammed to ~4TPS, that's giving good results.

We all want that 1000TPS with 90% confirmation 🥇

juliendubost commented 5 years ago

(I talk as a simple IOTA enthusiast)

Confirmation = synchronization between nodes. Synchronization needs work (CPU, network, etc) to be done by each nodes. More TPS = more work for each nodes More nodes = more sync work for each nodes More confirmation = more sync work for each nodes

It means that if you raise one of the parameter above, you raise the quantity of work that have to be done by each nodes, involving the behaviour you noticed.

The behaviour you expect shatters on the CAP theorem You can increase TPS if you reduce the confirmation rate or the numbers of nodes, or both. Also, you can increase confirmation rate by reducing TPS or number of nodes, or both.

There is no existing "combination of adjustments" that can solve this.

ghost commented 5 years ago

Confirmation = synchronization between nodes? Not sure what you mean. More nodes = more sync work? Why? More TPS is more sync work.

I agree that if you go above certain TPS you will face certain limits (regarding bandwidth, cpu, db processing). But this is not the case for the rates we're talking about, 5, 10, 50, 100 TPS... And when this will be a problem: that's where clustering comes in play. About your cap theorem: if a node cannot handle a certain TPS rate, it will run out-of-sync, not a big deal.

Actually IRI 1.5.5 partialy solved the problem I described above. Because now the tangle runs fine with 15 TPS. Confirmation rate stays fairly high, only the avg. conf time still increases significantly. Again tested with only 1.5.5 nodes, and healthy spam.

So I guess @gjeee was right that this was mostly about wrong solidity info. Especially available tips seem much more logical.

juliendubost commented 5 years ago

Ok, i misunderstood what you where talking about, sorry.