lightninglabs / neutrino

Privacy-Preserving Bitcoin Light Client
MIT License
894 stars 182 forks source link

Add sideloading functionality to neutrino #285

Open Chinwendu20 opened 1 year ago

Chinwendu20 commented 1 year ago

Related issue https://github.com/lightninglabs/neutrino/issues/70

This issue adds the sideloading functionality in neutrino. We want to be able to preload the block header and filter header store with headers before neutrino starts up. This optimizes speed.

Change summary:

Chinwendu20 commented 1 year ago

Thanks @positiveblue , please I left comments on your review and pushed some changes in response to your feedback (the ones that I am clear on).

Chinwendu20 commented 11 months ago

Hello @positiveblue, have you had the chance to look at this again. If there are no objections to my comment. I will just make fmt on this and include a description of the encoding and push.

Chinwendu20 commented 11 months ago

Hello can anyone please run the workflow?

Chinwendu20 commented 11 months ago

Please can anyone help run the workflow?

Chinwendu20 commented 11 months ago

hmm, this works fine locally. I think I would just write a simple unit test instead of the integration test.

Roasbeef commented 10 months ago

@Chinwendu20 have you run it with the race condition detector locally?

Chinwendu20 commented 10 months ago

Oh no I did not @Roasbeef but I have made some changes now and ran it with the race detector, hopefully, this works on the CI as well. Can anyone please help run the workflow?

Chinwendu20 commented 10 months ago

Oh okay I will just mock the reader interface and write a unit test now.

Chinwendu20 commented 10 months ago

Please can anyone help run the workflow

linden commented 10 months ago

Hi @Chinwendu20, I'd love to implement this into a wallet I'm working on for Joltz Rewards (https://joltzrewards.com).

Let me know if you'd like any support in addressing this PR review. Happy to dedicate time towards it.

Chinwendu20 commented 9 months ago

Nice work so far on this feature!

Completed an initial review, with the following jumping out:

  • Style guide not closely adhered to (godoc string, 80 char columns, etc)
  • The main set of interfaces can be simplified.
  • The control flow can be simplified: if we have more headers than in the file, we can terminate. We should check the file before starting up any other sub-system. The file can just have the header be removed, then appended in one step to the end of headers.bin.
  • We can also use the side loader for the filter headers as well. I think this is where that generic param can actually be used.
  • No need for custom re-org logic, just load the file if it's relevant, then let the normal p2p logic handle reorgs if needed.

Thanks for the review left some comments and would implement changes that I am clear on.

Chinwendu20 commented 9 months ago

Hi @Chinwendu20, I'd love to implement this into a wallet I'm working on for Joltz Rewards (https://joltzrewards.com).

Let me know if you'd like any support in addressing this PR review. Happy to dedicate time towards it.

Cool. I guess when the PR is merged you should be able to use it. Feel free to also make suggestions to the PR. Thanks for your help.

Chinwendu20 commented 9 months ago

Hello all what are your thoughts on the NextHeader function. https://github.com/lightninglabs/neutrino/pull/285/files/e13c95fa3e76fd1883a8b2d0db0ba96a53b9b4c2..76952b79ac2a0c77450bc4dbd74275c325f5b1ef#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3131

I say the function accepts an argument, n that would enable it fetch n headers at a time. On the blockmanager side if verification is disabled for sideloading, we fetch all the headers from the file at once using one NextHeaders(endHeight - startHeight) call and then write to the DB at once as well.

If verification is enabled we fetch a block header at a time i.e. NextHeader(1) as we verify headers individually.

What are your thoughts?

Crypt-iQ commented 9 months ago

Hello all what are your thoughts on the NextHeader function. https://github.com/lightninglabs/neutrino/pull/285/files/e13c95fa3e76fd1883a8b2d0db0ba96a53b9b4c2..76952b79ac2a0c77450bc4dbd74275c325f5b1ef#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3131

I say the function accepts an argument, n that would enable it fetch n headers at a time. On the blockmanager side if verification is disabled for sideloading, we fetch all the headers from the file at once using one NextHeaders(endHeight - startHeight) call and then write to the DB at once as well.

If verification is enabled we fetch a block header at a time i.e. NextHeader(1) as we verify headers individually.

What are your thoughts?

Making it able to return a batch of headers seems like a good idea. It can be limited to something like 2k

Chinwendu20 commented 9 months ago

Hello all what are your thoughts on the NextHeader function. https://github.com/lightninglabs/neutrino/pull/285/files/e13c95fa3e76fd1883a8b2d0db0ba96a53b9b4c2..76952b79ac2a0c77450bc4dbd74275c325f5b1ef#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3131 I say the function accepts an argument, n that would enable it fetch n headers at a time. On the blockmanager side if verification is disabled for sideloading, we fetch all the headers from the file at once using one NextHeaders(endHeight - startHeight) call and then write to the DB at once as well. If verification is enabled we fetch a block header at a time i.e. NextHeader(1) as we verify headers individually. What are your thoughts?

Making it able to return a batch of headers seems like a good idea. It can be limited to something like 2k

I am averse to doing this because if we are verifying the header that would mean reading bytes from the file and deserializing to wire.BlockHeader for headers that could potentially be invalid, just seems like a waste of time and processing power.

However, if we read a header at a time from the file, we can skip reading bytes and deserializing on the first instance of an invalid header.

Roasbeef commented 9 months ago

I am averse to doing this because if we are verifying the header that would mean reading bytes from the file and deserializing to wire.BlockHeader for headers that could potentially be invalid, just seems like a waste of time and processing power.

If one of the headers is invalid, then we'd through away the entire file (the should all connect and be valid). Arguably reading only 80 bytes at a time is a "waste of processing" power given it doesn't effectively utilize the I/O bandwidth available, and most filesystems will read in extra items of the file into a kernel level buffer cache to further reduce I/O (speculative caching). We can also just memory map the entire file, which is pretty much the most efficient way to read bytes from disk.

Ultimately, we'll throw an interface in front of the actual consumption (certain envs won't be able to directly read the file system as mentioned above), so we can optimize it further later.

Chinwendu20 commented 9 months ago

I am averse to doing this because if we are verifying the header that would mean reading bytes from the file and deserializing to wire.BlockHeader for headers that could potentially be invalid, just seems like a waste of time and processing power.

If one of the headers is invalid, then we'd through away the entire file (the should all connect and be valid). Arguably reading only 80 bytes at a time is a "waste of processing" power given it doesn't effectively utilize the I/O bandwidth available, and most filesystems will read in extra items of the file into a kernel level buffer cache to further reduce I/O (speculative caching). We can also just memory map the entire file, which is pretty much the most efficient way to read bytes from disk.

Ultimately, we'll throw an interface in front of the actual consumption (certain envs won't be able to directly read the file system as mentioned above), so we can optimize it further later.

Thanks for this, then we can go with loading all the headers in the file at a time, i.e. NextHeaders(sideloadHeaders.EndHeight - sideloadHeaders.StartHeight)

Chinwendu20 commented 9 months ago

This PR has been updated.

Chinwendu20 commented 8 months ago

Nice work w/ the latest revision! This is starting to get pretty close. IMO the final change we can make here is to decouple header side loading from the block manager all together. Instead we'll load the headers directly into the DB, use a refactored version of checkHeaderSanity to validate it, then start the block manager.

Also we don't need to look at these headers at all if we have more headers than the side load file.

Thanks for the review. Could you please throw more light on the last sentence? Can you confirm that this bit does not cover the concern: https://github.com/lightninglabs/neutrino/pull/285/files#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3133-R3140

Also please address this: https://github.com/lightninglabs/neutrino/pull/285#discussion_r1414517226

Chinwendu20 commented 8 months ago

Also are there comments on how the headers are being read from the io.ReadSeeker?

Chinwendu20 commented 8 months ago

Hello all cc: @Roasbeef , @Crypt-iQ

On decoupling the sideloading from the blockmanager. This is the function that would be added to neutrino.go (rough sketch we might end up returning an error, the main idea is to look at the process of verification, handling bad headers and writing to block header)

func (s *ChainService) sideLoadHeaders(sideload chaindataloader.
    BlockHeaderReader) {
    // If headers contained in the side load source are for a different chain
    // network return immediately.
    if sideload.HeadersChain() != s.chainParams.Net {
        log.Error("headers from side load file are of network %v "+
            "and so incompatible with neutrino's current bitcoin network "+
            "-- skipping side loading", sideload.HeadersChain())

        return
    }

    _, height, err := s.BlockHeaders.ChainTip()
    if err != nil {
        log.Warnf("Error fetching chain tip while sideloading %v", err)
        return
    }

    log.Infof("Side loading headers from %v to %v", height,
        sideload.EndHeight())

    // Set headerTip to enable reader supply header, node needs
    err = sideload.SetHeight(height)
    if err != nil {
        log.Errorf("error while setting height for sideload--- skipping "+
            "sideloading: %v", err)

        return
    }

    headerList := headerlist.NewBoundedMemoryChain(
        numMaxMemHeaders,
    )

    for {
        // Request header
        headers, err := sideload.NextBlockHeaders(2000)

        if len(headers) == 0 {
            return
        }
        // If any error occurs while fetching headers, return immediately.
        if err != nil {
            log.Errorf("error while fetching headers -- skipping "+
                "sideloading %v", err)
            return
        }

        for _, header := range headers {
            var (
                node     *headerlist.Node
                prevNode *headerlist.Node
            )
            // Ensure there is a previous header to compare against.
            prevNodeEl := headerList.Back()
            if prevNodeEl == nil {
                log.Warnf("side load - Header list does not contain a " +
                    "previous element as expected -- exiting side load")

                return
            }

            node = &headerlist.Node{Header: *header}
            prevNode = prevNodeEl
            node.Height = prevNode.Height + 1

            if !skipVerify {

                valid, err := verifyBlockHeader(header, *prevNode, 
                    headerList, s.checkHeaderSanity)
                if err != nil || !valid {
                    log.Debugf("Side Load- Did not pass verification at " +
                        "height %v-- rolling back to last verified checkpoint " +
                        "and skipping sideload", node.Height)

                    prevCheckpoint := b.findPreviousHeaderCheckpoint(
                        node.Height,
                    )

                    log.Infof("Rolling back to previous validated "+
                        "checkpoint at height %d/hash %s",
                        prevCheckpoint.Height,
                        prevCheckpoint.Hash)

                    err = b.rollBackToHeight(uint32(prevCheckpoint.Height))
                    if err != nil {
                        panic(fmt.Sprintf("Rollback failed: %s", err))
                        // Should we panic here?
                    }

                    return
                }

                // Verify checkpoint only if verification is enabled.
                if b.nextCheckpoint != nil &&
                    node.Height == b.nextCheckpoint.Height {

                    nodeHash := node.Header.BlockHash()
                    if nodeHash.IsEqual(b.nextCheckpoint.Hash) {
                        // Update nextCheckpoint to give more accurate info 
                        // about tip of DB.
                        b.nextCheckpoint = b.findNextHeaderCheckpoint(node.Height)

                        log.Infof("Verified downloaded block "+
                            "header against checkpoint at height "+
                            "%d/hash %s", node.Height, nodeHash)
                    } else {
                        log.Warnf("Error at checkpoint while side loading " +
                            "headers, exiting at height %d, hash %s", 
                            node.Height, nodeHash)
                        return
                    }
                }
            }

            // convert header to  headerfs.Blockheader and add to an  array.

                     headersArray = append(headersArray, header)
            b.headerList.PushBack(*node)
        }

        s.BlockHeaders.WriteHeaders(headersArray)
    }
}

Since we are decoupling, I plan on making some of the functions that were previously just blockmanager methods to neutrino's ChainService method then include them if needed to the blockmanager, by adding them as function fields to the blockmanager's config as function field. Some of these functions include:

(We can make these just functions instead of just methods to a struct but these would mean we pass the chainParameters as arguments everytime.)

_(This would also mean removing these fields from the blockmanager:

Is this okay?

I feel that no matter how we spin it we would be duplicating something the handleheaders function does if we decide to follow this route. Does it seem the same way, to you as well?

I feel that the concern about breaking something by refactoring the handleheaders, is the reason why we have tests? But if this does not seem like duplicity and/or there are better ideas of implementing this, perhaps decoupling the entire handleheaders function from the blockmanager (if that is a good idea) I am open to hearing it.

Roasbeef commented 7 months ago

@Chinwendu20

Also we don't need to look at these headers at all if we have more headers than the side load file.

I mean that if we determine that a flat file has 100 headers (consider that with a basic preamble in the file that lists the total number of headers, we can verify that based on the size of the file), but we have 1000 headers on disk, then we don't need to even look at the sideloaded headers.

Can you make a permalink of that code location? Otherwise, the link doesn't seem to work for me (won't go to the right location for the diff).

Roasbeef commented 7 months ago

@Chinwendu20

On decoupling the sideloading from the blockmanager.

Thanks for this draft! Some initial thoughts:

Generally I dig this direction, and it shoudl let us cut down on quite a lot of code.

I think once this latest diff is up, we also want to make a PR to lnd that adds in the new config, so we can start to get a feel of how users would use the feature in the first place. I think we'll want a new config option, and also likely a new RPC on the existing neutrino sub-server to allow users to pass in the header blob directly.

Chinwendu20 commented 7 months ago

@Chinwendu20

On decoupling the sideloading from the blockmanager.

Thanks for this draft! Some initial thoughts:

  • Why does it set the height of the side loader based on our local height? What we want to check is that we even need to consule the file in the first place. See my comment above about comparing the heights, and bailing if we have more headers thatn the filter.

Please confirm that the setHeight function does not do that already: https://github.com/lightninglabs/neutrino/blob/e8fca396d84427bb3882a879f08200f3dedc07b6/chaindataloader/binary.go#L206-L209

  • We can also generalize this to work for both block headers and also filter headers. See my comment way above in the review where I suggest a way to use type params to re-use code between the two types of headers. We can also further make a wrapper struct to abstract away: how the headers are verified, and how they're written to disk. This lets us maintain a generalized loop where we just care about validating the header and also writing them to disk.
  • I don't think this needs to be a method on the chain service, instead it can be a new struct to accept the dependancies during init. This'll also make this logic easier to test.

The wrapper struct mentioned in point 2 and the struct accepting the dependencies during init you referred to in point 3, I am guessing they all refer to one struct. Also creating this struct would mean that:

Is this in line?

  • We may want to make the chunk size (number of headers read at a time) configurable.

Generally I dig this direction, and it shoudl let us cut down on quite a lot of code.

I think once this latest diff is up, we also want to make a PR to lnd that adds in the new config, so we can start to get a feel of how users would use the feature in the first place. I think we'll want a new config option, and also likely a new RPC on the existing neutrino sub-server to allow users to pass in the header blob directly.

Roasbeef commented 7 months ago

Please confirm that the setHeight function does not do that already:

Sure that works. I meant more so that we just don't need to call that method at all if our local height is beyond the height in the file. Otherwise as is, the caller needs to handle that error. Notice how if unchanged, each time a user starts up, they see a confusing error.

We both agree that this struct for verification and for accepting dependencies would be outside the chaindataloader package

No this is about making the code more generalized so it can support both filters headers and block headers. This logic can still live in the package. As a thought exercise: how would you go about updating the PR to support both block and filter headers? After you've thought about that, revisit my comment.

This is about abstracting header verification, and exactly how it's written to disk.

We would create two distinct wrapper struct for the filter header and then the block header? as I know the process of block header and filter header verification are quite different Or we use one struct and have it accept all dependencies for both the filter header and block header verification, call it headerVerfier or something along those lines?

Yes they are different, but given an abstract header, you can verify it. The verification differs, but we can create a interface that abstracts away from those details and allows the caller to execute a generalized ingestion loop.

In line with not repeating logic as the concern I raised in this comment: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 we might need to at least for block headers, refactor the handleheaders function to use this headerVerifier struct for verification?

We don't need to change any of the core logic in the daemon. We also don't need to have the method on the ChainService struct as I've mentioned several times above. To aide you in the thought experiment above, examine that route: where would it need to change to also support filter headers? How would you abstract the code to have it work for both filter and block headers?

Chinwendu20 commented 6 months ago

Thanks for clarifying

Sure that works. I meant more so that we just don't need to call that method at all if our local height is beyond the height in the file. Otherwise as is, the caller needs to handle that error. Notice how if unchanged, each time a user starts up, they see a confusing error.

So currently we have something like this on the caller end:

    err = sideload.SetHeight(height)
    if err != nil {
    // What I understand is that you do not want this logged error to show up every time.
        log.Errorf("error while setting height for sideload--- skipping "+
            "sideloading: %v", err)

        return
    }

View comment in the code snippet for my understanding on the problem you would like addressed

Maybe to address this, I create an error in the chaindataloader package, ErrNoHeader so instead of what we have here on line 207:

https://github.com/lightninglabs/neutrino/blob/e8fca396d84427bb3882a879f08200f3dedc07b6/chaindataloader/binary.go#L206-L209

We return that error i.e. the ErrNoHeader instead so from the caller end we modify it to:

    err = sideload.SetHeight(height)

    // We only log an error if another error comes up that is not the error that indicates that the sideload source do not have 
// the required next headers.
    if err != nil && err != chaindataloader.ErrNoHeader{
    // What I understand is that you do not want this logged error to show up every time.
        log.Errorf("error while setting height for sideload--- skipping "+
            "sideloading: %v", err)

        return
    }

Would this work?

We both agree that this struct for verification and for accepting dependencies would be outside the chaindataloader package

Thanks for this. The current design for the chaindataloader is to just supply headers. So anything that has to do with how those supplied headers are handled depends on the chaindataloader's caller. That is why this comment.

No this is about making the code more generalized so it can support both filters headers and block headers. This logic can still live in the package. As a thought exercise: how would you go about updating the PR to support both block and filter headers? After you've thought about that, revisit my comment.

From this I take it that you are suggesting that the functionalities of the chaindataloader be extended to verify headers as well?

This is about abstracting header verification, and exactly how it's written to disk.

We would create two distinct wrapper struct for the filter header and then the block header? as I know the process of block header and filter header verification are quite different Or we use one struct and have it accept all dependencies for both the filter header and block header verification, call it headerVerfier or something along those lines?

Yes they are different, but given an abstract header, you can verify it. The verification differs, but we can create a interface that abstracts away from those details and allows the caller to execute a generalized ingestion loop.

In line with not repeating logic as the concern I raised in this comment: #285 (comment) we might need to at least for block headers, refactor the handleheaders function to use this headerVerifier struct for verification?

We don't need to change any of the core logic in the daemon. We also don't need to have the method on the ChainService struct as I've mentioned several times above. To aide you in the thought experiment above, examine that route: where would it need to change to also support filter headers? How would you abstract the code to have it work for both filter and block headers?

I read back at your previous comment to better understand your suggestion.

  • We can also generalize this to work for both block headers and also filter headers. See my comment way above in the review where I suggest a way to use type params to re-use code between the two types of headers. We can also further make a wrapper struct to abstract away: how the headers are verified, and how they're written to disk. This lets us maintain a generalized loop where we just care about validating the header and also writing them to disk.

I think what I understand from this point is that you are suggesting a single loop for verifying headers and writing into the disk as opposed to my current design of for example, taking a look at my draft here: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 this loop only works for block headers. Is that correct?

Roasbeef commented 6 months ago

Maybe to address this, I create an error in the chaindataloader package, ErrNoHeader so instead of what we have here on line 207:

What I mean is that we don't even really need that code path at all: if our height is beyond the hight of the header file, we simply don't need to attempt to load it in or fully initialize it.

From this I take it that you are suggesting that the functionalities of the chaindataloader be extended to verify headers as well?

Yes.

I think what I understand from this point is that you are suggesting a single loop for verifying headers and writing into the disk as opposed to my current design of for example, taking a look at my draft here: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 this loop only works for block headers. Is that correct?

Yes. Continue to further develop that (generalize to abstract out to support both block+filter headers), then update the PR with the new approach.

Chinwendu20 commented 6 months ago

Thanks a lot for the review so far @Roasbeef, if we move header verification for the sideload to the chanindataloader package that means, we import the dependencies listed here: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 to the chaindataloader package, causing a circular dependence or if we do not import we duplicate those by rewriting it. We can have a loop that runs for all headers for both the verification and writing headers to the DB, have two functions that abstract these functionalities in the loop but their implementation might have to be in the neutrino package. What do you think?

Roasbeef commented 6 months ago

You don't need to import the dependancies. An interface can be used instead: you declare how the object is to be used, and the caller figures out what to pass in. We don't need to duplicate anything, nor introduce a circular dep.

Chinwendu20 commented 6 months ago

Please I have updated this PR, I have decoupled the sideloading from blockmanager, moved verification to the chaindataloader and created a generalized ingestion loop, this is not the final form of the PR yet, I have not written unit test and ran gofmt to fix formatting nits. I would like to get review on this to know if I can move forward.

ellemouton commented 6 months ago

In order to allow for operation we would need to continously sideload.

@sputn1ck - I thought it was just for speeding up initial sync? I could totally be wrong though.

From the initial issue description (#70 ):

This would allow one to package a set of (possibly compressed) headers that will be written to disk on start up before we start to fetch headers from the network.

saubyk commented 6 months ago

@sputn1ck - I thought it was just for speeding up initial sync? I could totally be wrong though.

+1 This my understanding as well. Side loading should only deal with initial sync

kaloudis commented 6 months ago

@sputn1ck - I thought it was just for speeding up initial sync? I could totally be wrong though.

+1 This my understanding as well. Side loading should only deal with initial sync

This is interesting. Let's say someone is on a high latency connection, and can't communicate to Neutrino peers at all.

Would getting out of sync after ~10 minutes cause operational issues? Seems like it could potentially. I could be wrong, but seems like we do need continuous sideloading

sputn1ck commented 6 months ago

@sputn1ck - I thought it was just for speeding up initial sync? I could totally be wrong though.

+1 This my understanding as well. Side loading should only deal with initial sync

This is interesting. Let's say someone is on a high latency connection, and can't communicate to Neutrino peers at all.

Would getting out of sync after ~10 minutes cause operational issues? Seems like it could potentially. I could be wrong, but seems like we do need continuous sideloading

Hey @kaloudis !

So I clarified offline, this PR only has the scope of initial sync. There are multiple ways of doing continuous sync (e.g. replacing tcp peers with http peers), which will be looked at after this PR!

Chinwendu20 commented 6 months ago

Thanks for the review @sputn1ck. This was an intial PR to find out if there is an agreement on design and if I am on track to address @Roasbeef 's review. I would submit a more polished PR after I confirm if this is the direction we would like to go. Thank you

Roasbeef commented 6 months ago

This is interesting. Let's say someone is on a high latency connection, and can't communicate to Neutrino peers at all.

Same thing as today: their block sync would be stalled.

Would getting out of sync after ~10 minutes cause operational issues? Seems like it could potentially. I could be wrong, but seems like we do need continuous sideloading

That's another project entirely, this is about making initial sync nearly instant. Something like that would need more external infrastructure to continually update and feed the headers into the neutrino node. Oli has a project down that line where things are purely HTTP/CDN based, but not all the components are fully in place there.

lightninglabs-deploy commented 5 months ago

@roasbeef: review reminder @chinwendu20, remember to re-request review from reviewers when ready

Chinwendu20 commented 5 months ago

@ellemouton, you can review now, thank you so much. I have not yet included a commit to remove code that is now redundant in blockmanger. As well as updated the blockmanager to use the validator and writer that was created in this PR. Please skip formatting comments for now.

I just want to know if it is in line with Laolu's comments concerning encapsulation and using interfaces in the chaindataloader package and if the design is okay in general.