lightninglabs / taproot-assets

A layer 1 daemon, for the Taproot Assets Protocol specification, written in Go (golang)
MIT License
457 stars 110 forks source link

Block RPC calls if not properly started or errored out during start #1122

Open dstadulis opened 3 weeks ago

dstadulis commented 3 weeks ago

In a recent user report, litd dysfunction was shown because the universe RPC call ran before the universe sub service was online.

See TODO summary here: https://github.com/lightninglabs/taproot-assets/issues/1122#issuecomment-2345928938

ViktorTigerstrom commented 3 weeks ago

Thanks for the issue report! Should we allow any rpc requests to the tapd before the universe sub-service has been started?

If they should be separated, so that we would want allow some requests prior to the universe sub-service, would an ok first version be to just disallow any tapd RPCs until the universe sub-service is up, and then create the separation at a later stage?

jharveyb commented 3 weeks ago

I don't think they should be separated, or that there would be any benefit to that.

IIUC the desired fix is to disallow all tapd RPCs until all tapd sub-services are up.

ellemouton commented 3 weeks ago

LiT today should already take care of this. We only mark a subserver as "ready for calls" once the Start (for tap, this is the (s *Server) StartAsSubserver method) method of that subserver returns without an error. We want to keep lit as generic as possible in terms of handing of subservers and so it is expected for each subserver that it is ready to handle calls once that start method has returned and we really should not be doing extra calls to check if various subserver specifics are ready - the SLA here is that things should be ready once that method has returned.

ellemouton commented 3 weeks ago

If yall dont want to permanently change the behaviour, then I suggest adding a "BlockTillUniverseStart" functional option or something on that call

guggero commented 3 weeks ago

Hmm, I see. That is the correct behavior as far as I can see. We only have a screenshot of a stack trace to go from and it looked like an RPC request did come through before the internal gRPC server in tapd was started.

But perhaps tapd stopped with an error and the request came through after the gRPC server was stopped? Though from just the code it looks like that's handled correctly as well.

Need more info from the user then, preferably actual logs.

jharveyb commented 3 weeks ago

Reposting the stack trace as text:

```bash 2024-08-29 11:27:64.591 [DBG] RPCS: [/Lnrpc.State/GetState] requested 2024-08-29 11:27:64.682 [DBG] RPCS: [/Lnrpc.State/GetState] requested 2024-08-29 11:27:04.690 [DBG] RPCS: [/universerpc.Universe/ListFederationServers] requested 2024-08-29 11:27:64.692 [DBG] RPCS: [/Lnrpc.Lightning/ChecktacaroonPermissions] requested 2024-08-29 11:27:64.692 [DBG] RPCS: [/Lnrpc.Lightning/ListPeers] requested panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=@x1 addr=0x38 pc=9x16c62a0] goroutine 3224 [running]: github. com/lightninglabs/taproot -assets. (*rpcServer).ListFederationServers(@x0?, {@x32b0ad0?, 0x40026efc50?}, @x1bb10407) github.con/Lightninglabs/taproot-assets@vd.4.2-0.26240815180811-2110839696cb/rpcserver .g0:5654 +8x20 github. com/lightninglabs/taproot -assets/taprpc/universerpc._Universe_ListFederationServers_Handler .func1({@x3Zb0ad0?, @x40026efc50?}, {@x1bb1040?, github.con/Lightninglabs/taproot -assets@vd.4.2-0.26240815180811 -2110839696cb/taprpc/universerpc/universe_grpc.pb.90:688 +0xd0 github. com/lightningnetwork/Lnd/rpcperms.(*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .middLewareUnaryServer Interceptor . func7({@x3zb0adé github. com/lightningnetwork/Lnd@vd.18.0-beta.rc4.0.20240730143253- 1b353b0bfd58/rpcperms/interceptor .go:832 +0xec ‘google.golang.org/grpc.getChainunaryHandler .funct({9x32bGad0, @x40026ef C50}, {@x1bb1049, 6x49026eFcB0}) google. golang.org/grpc@v1.59.0/server.go:1163 +0xa0 github. com/lightningnetwork/Lnd/rpcperms. (*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .MacaroonUnaryServer Interceptor . func5({@x3Zb0ad0, github. com/lightningnetwork/Lnd@v@.18.0-beta.rc4.0.20240730143253- 1b353b0bfd58/rpcperms/interceptor .go:689 +0x90 google. golang.org/grpc.getChainUnaryHandLer . func1({@x32b0ad0, @x40026efc50}, {@x1bb1040, @x40026efc80}) google.golang.org/grpc@v1.59.0/server.go:1163 +8xa0 github. com/lightningnetwork/Lnd/rpcperms. (*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .rpcStateUnaryServer Interceptor . Func3({@x3Zb0ad0, github. com/lightningnetwork/Lnd@vd. 18.0-beta.rc4.0.20240730143253-1b353bebfdS8/rpcperms/interceptor .go:781 +6x108 google. golang.org/grpc.getChainUnaryHandLer . func1({@x32b0ad0, @x40026efc50}, {@x1bb1040, @x40026efc80}) google.golang.org/grpc@v1.59.0/server.go:1163 +8xa0 github. com/lightningnetwork/Lnd/rpcperms.(*InterceptorChain) .CreateServer0pts.errorLogunaryServer Interceptor . funcl({@x32b0ad0?, @x40026efc50?}, {0: github. com/lightningnetwork/Lnd@v@. 18.0-beta.rc4.0.20240730143253- 1b353b0bfd58/rpcperms/interceptor .go:605 +0x48 google.golang.org/grpc.NewServer .chatnUnaryServerInterceptors.chainUnaryinterceptors.funci({@x32b0ad®, @x49026efC50}, {@x1bb1040, @x49026eFc8}, ‘google.golang.org/grpc@v1.59.0/server .go:1154 +0x88 github. com/lightninglabs/taproot -assets/taprpc/universerpc._Universe_ListFederationServers_Handler({@xleSe5a0, 0x400056e600}, {@x32b0ad0, 0x40026et github.con/Lightninglabs/taproot -assets@vd.4.2-0.26240815180811 -2110839696cb/taprpc/untverserpc/universe_grpc.pb.g0:690 +9x148 ‘google.golang.org/grpc. (*Server).processUnaryRPC(Gx40008541e9, {Gx32bGad0, @x49026eF99}, {6x32c2920, 0x4000588340}, 0x40026b19e0, 6x4900591c20, google. golang.org/grpc@v1.59.0/server .go:1343 +0xb40 google.golang.org/grpc. (*Server).handleStream(@x40008541e0, {0x32c2920, @x4000588340}, @x40026b19e9) ‘google. golang.org/grpc@v1.59.0/server .go:1737 +8x95c google. golang.org/grpc. (*Server).serveStreams. func1.1() google. golang.org/grpc@v1.59.0/server.go:986 +0x88 created by google.golang.org/grpc.(*Server).serveStreams.funcl in goroutine 3223 google. golang.org/grpc@v1.59.0/server.go:997 +0x14c ```
Roasbeef commented 3 weeks ago

So today we have a middleware interceptor that'll bounce all calls until the server is fully started: https://github.com/lightninglabs/taproot-assets/blob/72b93f84a0afa08e01c99d582357672133fc9b20/rpcperms/interceptor.go#L380-L408

We then set to active after all the sub-systems have started here: https://github.com/lightninglabs/taproot-assets/blob/72b93f84a0afa08e01c99d582357672133fc9b20/server.go#L374-L378

From the trace, either the rpcserver or FederationDB was nil at that point, which is puzzling (all the sub-system structs have already been initialized at that point). So perhaps some mutation occurred somewhere?

lukegao209 commented 3 weeks ago

This issue is coming from our team. When the terminal is restarted while lnd and tapd are still in the process of syncing, the terminal crashes continuously due to receiving various requests from tapd (such as assets list, subscribeRfqEvents, universe).

guggero commented 2 weeks ago

@lukegao209 could you send us a full log (including the stack trace) from the beginning of a startup sequence until it panics?

guggero commented 1 week ago

@lukegao209 also, what port are you issuing the ListFederationServers call on? The main litd port (8443) or the integrated tapd port (10029)?

lukegao209 commented 1 week ago

@lukegao209 also, what port are you issuing the ListFederationServers call on? The main litd port (8443) or the integrated tapd port (10029)?

port 10029

ellemouton commented 1 week ago

@lukegao209 - ok cool thanks 🙏 This makes sense then - rather point your requests to Lit's port 8443 as then LiT will block calls to tapd until it is ready.

@Roasbeef re

So today we have a middleware interceptor that'll bounce all calls until the server is fully started:

This is actually not the case for the Subserver startup of Tap.

lukegao209 commented 1 week ago

In conclusion, for the terminal, it should request lnd on port 10009; for other services (tapd, loop, pool, etc.), they should all request port 8443. ?

ellemouton commented 1 week ago

you can also point LND requests to 8443 👍

lukegao209 commented 1 week ago

thanks

lukegao209 commented 1 week ago

btw , will terminal’s account support taproot assets channel?

guggero commented 1 week ago

btw , will terminal’s account support taproot assets channel?

not out of the box, no. we'll need to update the litd account system once taproot asset channels are fully implemented.

lukegao209 commented 1 week ago

If Taproot Assets are implemented through parsing the invoice’s custom_data, maybe I can start working on supporting it now.

Roasbeef commented 1 week ago

@ellemouton

but for StartAsSubserver which LiT uses , the interceptor chain in Tap is not set up

Gotcha, ok that seems to be the core issue here. We should retain the interceptor chain for tapd, either using the same system, or a more explicit check.

ellemouton commented 1 week ago

@guggero - with our latest offline discussion, do you rate we can close this and move it to the Tap repo?

guggero commented 1 week ago

Yes. TODOs are:

Transferring to tapd repo.