spacemeshos / pm

Project management. Meta-tasks related to research, dev, and specs for the Spacemesh protocol and infrastructure.
http://spacemesh.io/
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

PoET ticks #124

Closed lrettig closed 1 year ago

lrettig commented 1 year ago

Missing PoET-related functionality

merkle-tree

Support unbalanced trees

Here's the issue in the merkle-tree repo. @moshababo pointed out here that it might not be trivial to implement:

I don’t remember what kind of work is needed for that. It might wasn’t straightforward to implement due to some advanced features like Merge (https://github.com/spacemeshos/merkle-tree/blob/develop/cache/merge.go#L13) and Group (https://github.com/spacemeshos/merkle-tree/blob/develop/cache/group.go#L19), which might were added for supporting the old version of PoST, that was based on this lib. This needs to be investigated.

Avoid redundant allocation

While someone's in the context of this library, we should probably consider this pull request from someone in the community. @dshulyak pointed out here that since the lib is single-threaded, a simpler solution can be utilized (statically allocate a variable and re-use it). OTOH, it's a generic library that anyone can use, and I'm not sure, but it could possibly work in a multithreaded context, so we can consider not breaking that.

poet

Better configuration / alignment with epochs

The PoET is currently very hard to align with Spacemesh epochs. Currently the core PoET service is configured with 3 parameters: Parameter Definition
initialduration Duration of the opening time for the initial round.
duration Duration of the opening time for each round.
n PoET time parameter (the tree depth).
If n is set low enough, the duration param dominates and the length is determined by that. This means that PoET completes the proof and then idles until the next round is supposed to begin. In mainnet we'll want to control PoET more tightly. We'll want to set a precise completion time, in such a way that when it's reached execution stops. The flip side is also desirable: there's no reason to stop extending the PoET tree before the completion time is reached. This is especially important in case the PoET server crashes and is restored. To make alignment even easier and less error prone, I think that configuration should be controlled using only these two parameters: Parameter Definition
genesis_time Genesis time as unix timestamp
epoch_duration Length of an epoch in seconds
tick_size Number of iterations (leaves) that count as a single PoET tick
phase_shift Number of seconds to shift the PoET cycle from the start of the epoch, to allow for phased PoETs
cycle_gap Number of seconds before the start of the next cycle that work should stop and a proof should be published

Based on these params, PoET will calculate the next round start time. Start times are:

for any epoch_number >= 0:
poet_round_start_time := genesis_time + (epoch_number * epoch_duration) + phase_shift

Each round ends cycle_gap seconds before the next one begins (this is to allow smeshers to create a PoST proof and publish their ATX without missing registration for the next round).

Every time a round ends, any remaining iterations that don't fit in a round number of tick_sizes will be discarded and a proof will be generated and published.

go-spacemesh

Introduce Ticks

The number of ticks in a PoET proof is the LeafCount divided by TickSize, ignoring any remainder.

TickSize should be added to the node configuration. We'll want to set it to a value that will allow PoETs to run ~10,000 ticks in an epoch, based on PoET benchmarks before genesis.

A PoET proof's TickCount should be available to fetch from the PoetDb. This will be fetched, at the same time we validate that an ATX that references a PoET proof is really a member. The existing implementation of verifying membership is not currently efficiently implemented (the entire proof message is loaded from disk and deserialized, while only the list of members is needed - and it can probably be even more efficient than that).

I leave it up to the developer implementing this to find an efficient way to perform this entire operation of validating membership in a PoET round and fetching this round's TickCount. We can store the TickCount in the DB directly, or calculate it on the fly from the LeafCount (just an integer division - super cheap).

Changes to ATXs

Today, the NIPostChallenge (which is eventually embedded in ActivationTxHeader) includes StartTick and EndTick fields. There's no reason to explicitly include (or commit to) these values, so they should be removed.

These fields are used:

We should add the following to ActivationTxHeader:

When a new ATX is validated, baseTickHeight is set to the value of the positioning ATX's TickHeight(). Also, tickCount should be extracted from the PoET proof and cached.

GetWeight() should use TickCount() to multiply with the number of PoST units.

In GetPositioningAtxInfo(), in the case if id == b.goldenATXID we should return 0 as the TickHeight.

We should update UpdateTopIfNeeded() to replace the top ATX if the new ATX is either in a higher epoch, or in the same epoch and has a higher TickHeight.

Add TickHeight to Blocks

When constructing a block, its height is explicitly included. It's determined by iterating over the ATXs of the proposals used to construct the block, and picking the highest BaseTickHeight of all of them.

Because the Golden ATX has a TickHeight of 0 by definition, all ATXs in epoch 1 will have a BaseTickHeight of 0, so all blocks in epoch 2 (the first epoch with blocks) will have a TickHeight of 0 as well.

Tortoise

Limit Ballots Voting for Blocks in Future

Ballots cannot vote for or against blocks with a higher TickCount than their ATX, they can only abstain. We need to take this into account and ignore any non-abstain votes in that case.

It's assumed that all ATXs in epoch n have a higher TickHeight than any ATX in epoch n-1. This is not guaranteed, but it's a reasonable assumption, at least for a while. This will become a less reasonable assumption when we incentivize smeshers to de-concentrate ATXs (publish them in different layers during the epoch).

Break Ties in Layers with Multiple Valid Blocks

There is a possibility that layers end up having more than one valid block. This can happen in extreme cases, when we resort to Weak Coin for voting.

The current mechanism, in this case, executes the block with the lexicographically lowest ID (see mesh.go → getBlockToApply()). Instead, we should pick the block with the lowest TickHeight (the one that the most ballots could vote for and against). Only if this is also a tie, we resort to lexicographic order of IDs.

moshababo commented 1 year ago

Here's the issue in the merkle-tree repo. @moshababo pointed out here that it might not be trivial to implement:

Just to clarify, I mentioned that it wasn't straightforward to implement, because of the Group and Merge features, which were added to support PoST. Since then PoST became tree-less, and thus no longer using this lib. If these features will be deprecated, it will probably be easy to implement the support for unbalanced trees.

dshulyak commented 1 year ago

Ballots cannot vote for or against blocks with a higher TickCount than their ATX, they can only abstain. We need to take this into account and ignore any non-abstain votes in that case.

can we make ballot syntactically invalid in such case, so that it won't be accepted by the node?

Ballots cannot vote for or against blocks with a higher TickCount than their ATX, they can only abstain. We need to take this into account and ignore any non-abstain votes in that case.

this breaks verifying tortoise. currently it is faster because abstain votes are bounded by zdist, and all other votes can be counted in batches. but if we don't have such bound anymore then verifying tortoise will have the same worst case complexity as full tortoise, it will be less than quadratic but still more complex than linear.

dshulyak commented 1 year ago

Have you considered what implications it will have for blocks pruning? Ballot should be able to count vote without checking tick height as it may not be available.

It will also have an impact on base ballot selection. Historically we tried to minimize the difference between local opinion and base ballot opinion. Base ballot height needs to be added as a factor into this optimization. Not sure yet how complex is this task.

pigmej commented 1 year ago

@lrettig, I have a feeling that this one is done :)

noamnelke commented 1 year ago

Have you considered what implications it will have for blocks pruning? Ballot should be able to count vote without checking tick height as it may not be available.

Thanks for bringing this up @dshulyak. We're starting to work on block pruning today, so we'll consider this. We need the block's layer if we prune it regardless, so this is just another field.

It will also have an impact on base ballot selection. Historically we tried to minimize the difference between local opinion and base ballot opinion. Base ballot height needs to be added as a factor into this optimization. Not sure yet how complex is this task.

I'm not sure it should have an impact on that. Both the base ballot and the local ballot (the one we're building) should have an opinion about any block they are aware of (even if it's in "their future"), so they can pick any base ballot and express the diff and votes for "future" blocks will be ignored when counting. I think this is what we decided with Tal, isn't it?

noamnelke commented 1 year ago

I only now realized that I responded to a very old comment. I think all these questions have already been discussed and agreed upon. @dshulyak are there still any open questions or something that's unclear in this context?

selfdual-brain commented 1 year ago

Extended spec for the Tortoise voting part of the problem: https://hackmd.io/@sm-noam/B1pN5NZ0c

pigmej commented 1 year ago

@selfdual-brain implementation part, which was added as #143 was closed, therefore it's not so wise to keep this one open. Please create a new one for the problem mentioned above.