Closed ebuchman closed 4 years ago
Thanks for taking the time to do a thorough review. I agree with most of the concerns here. A few comments at the meta level before jumping into specific points. Even if I consider the refactor successful, I think we could have scoped it tighter. This have gotten it merged faster and made it easier to describe in a ADR. We had some key goals that were addressed but a lot of the value that we got from this work was peripheral and not negotiated in the original scope. We can definitely be more diligent here in the future. With that said I’ll focus of the process/architecture concerns.
Agree here and as mentioned above this would have been helped by better scoping. We kind of abused ADR-006 here to capture the evolution of our thinking but I sense that a lot of details got lost in the quick iterations. The process was dynamic and aligned with the English specification. Most of this alignment was done via live code reviews in our weekly meeting. There were so many ideas flying around and being experimented with that it was difficult to capture intermediate thinking in writing. The speed of the process was valuable but admit that it would be better/more inclusive if we kept a better record of our thinking and decisions. I think you’re right that it is worth describing the conclusion of this work in one or perhaps multiple ADRs.
Yes the intention is for the light client to use its own types which only have the fields the light client needs. But those types will most likely be constructed from the original Tendermint types for simplicity (or have From trait implementations). We haven’t gotten there yet as we wanted to merge this as soon as possible.
What we did de-couple is the crypto functions from their types by isolating the dependencies in the operations
module as you mention. The idea here is to update Tendermint crate such that none of our core types depend on complex crypto functions. It will also allow us to inject simplified implementation of these operations
during testing. So most likely the operations
stuff will move into the main crate and provide mock implementations for testing. This way the simplified core types will not need to be mocked out in test, only the operations
will. The current location of this code is an intermediate step in that direction.
Also a note about the scheduler. It's a bit hard to read the bisection logic out of it.
trusted
, next
, and target
to make this more readable in cases like this where it's obvious we're dealing with heights and no other types? @ebuchman Thanks for thorough review :)
I have answered some of your questions and tracking progress on your comments in https://gist.github.com/romac/fe0415aba0397bf14912b505d5eba785.
Can we confirm that the major points have been either addressed in (#284 #302 ) or broken out into follow up issues?
Closing for now. Feel free to open if something was missed.
Did a full review of the new light-client code. It looks really great, especially all the linking to the spec, the separation of components, the clarity of the sequential list of predicates, and the overall completeness of everything. That said I have some concerns and a variety of feedback, which I'll just braindump here, roughly by section/concern.
And apologies for having not been able to review more frequently. I've gotten into a habit recently of dropping large reviews all at once after months of work, which probably isn't the best approach. I'm going to try and fix that moving forward.
ADR
At a meta/process level, we probably shouldn't have done this without a more complete ADR. That's my bad, I explicitly dismissed it at the time, but I think I was anxious about making progress on this and we seem to have underestimated how long this work would take (classic). It definitely would have helped to have it during the review and to codify why we did this refactor and how it compares to the previous versions. Some of the below concerns may have been properly addressed in an ADR, including a possible iterative plan to address them. It might even be worth still writing one.
Coupling to Tendermint types
The new crate seems heavily coupled to the tendermint crate. One of the original design goals was for the light-client logic to be decoupled from the tendermint types, and the previous version pretty much achieved this via the header/commit/validator etc traits. Is there some plan for how to address this?
My understanding was that the predicates would be closures, and would close over the specific types, so they wouldn't need to take those concrete types as arguments. Is that still the plan?
This decoupling isn't critical in the short term, and right now we need to focus on shipping this rather than more refactors, but I still think it matters, so we should have a plan for addressing down the road.
The traits in
operations
expose more or less the same set of methods that were exposed by the traits in the prior version, which suggests they'd need to be mocked out similarly, but my understand was that we wanted to get away from mocking data like that. Iirc, a big motivation for the refactor was supposed to be for improved unit tests by reducing the need to mock these kinds of traits. Meanwhile, we don't have any unit tests yet, which might be a bit concerning. My impression is that given the amount of coupling, unit testing eg. the Verifier::verify method will require a lot of mocking, similar to the previous version of the code ...Other High Level
We're using a lot of
use::*
- we should probably prefer explicit imports.We should create issues for the TODO and FIXME in the code.
Bisection Tracing
trace_block
? We're not really tracing here, we are marking something is on a trace. At least a better comment where it's calledStore
latest
so we don't iterate over everything (ie. see the FIXME)Light Client
Operations
tendermint
crate. Presumably these will get deduplicated?Predicates
is_within_trust_period:
header.time < now
check seems to be duplicatedheader.time <= now
check should probably useheader.time < now + delta
, ie #137header_matches_commit:
verify_overlap
has_sufficient_validators_overlap
but nothas_sufficient_signers_overlap
has_sufficient_voting_power
has_sufficient_signers_overlap
Verifier
verify
method is beautiful :)verify_overlap
should only be called in the non-sequential case, ie. whenlight_block.height() != trusted_state_next_height
. Otherwise verification would fail if the validator set changes significantly from one block to the next and we'd never be able to make progress. We should definitely have a JSON test for this, I'm a bit surprised if we don't already ...Errors