lightninglabs / lightning-terminal

Lightning Terminal: Your Home for Lightning Liquidity
MIT License
501 stars 87 forks source link

Tapd RPC appears to refuse to start? #812

Closed habibitcoin closed 1 month ago

habibitcoin commented 1 month ago

When running both lnd and tapd in integrated mode, it seems that the tapd RPC refuses to start, and the tapcli refuses to connect. lncli however works just fine, as well as litcli.

weirdly it looks like it starts fine:

2024-08-06 00:36:21.787 [INF] RPCS: Trader server is now active
2024-08-06 00:36:21.845 [INF] SSVR: Opening sqlite3 database at: /Tapd/data/mainnet/tapd.db
2024-08-06 00:36:21.855 [INF] TADB: Attempting to apply migration(s) (current_db_version=21, latest_migration_version=21)
2024-08-06 00:36:21.855 [INF] TADB: Current database version is up-to-date, skipping migration attempt and backup creation (current_db_version=21, max_migration_version=21)
2024-08-06 00:36:21.856 [INF] TADB: Database version after migration: 21
2024-08-06 00:36:22.436 [INF] SSVR: Configuring universe.lightning.finance:10029 as initial Universe federation server
2024-08-06 00:36:22.436 [INF] TSVR: Version: 0.4.1-alpha commit=v0.4.2-0.20240725155459-2bf18437e945, build=production, logging=default, debuglevel=debug
2024-08-06 00:36:22.436 [INF] TSVR: Active network: mainnet
2024-08-06 00:36:22.436 [INF] RPCS: Validating RPC requests based on macaroon at: /Tapd/data/mainnet/admin.macaroon
2024-08-06 00:36:22.438 [INF] GRDN: Starting ChainPlanter
2024-08-06 00:36:22.463 [INF] GRDN: Retrieved 2 non-finalized batches from DB
2024-08-06 00:36:22.463 [INF] GRDN: Starting asset custodian
2024-08-06 00:36:22.463 [INF] GRDN: Gardener for ChainPlanter now active!
2024-08-06 00:36:22.463 [INF] GRDN: Loading pending inbound asset events
2024-08-06 00:36:22.463 [INF] RPCS: New transaction subscription
2024-08-06 00:36:22.464 [INF] GRDN: Starting re-org watcher
2024-08-06 00:36:22.464 [INF] NTFN: New block epoch subscription
2024-08-06 00:36:22.464 [INF] GRDN: New block at height 855597
2024-08-06 00:36:22.465 [INF] GRDN: Resuming 0 pending inbound asset events
2024-08-06 00:36:22.465 [INF] GRDN: Loading wallet transactions starting at block height 0
2024-08-06 00:36:22.476 [INF] GRDN: Checking 6 wallet transactions for inbound assets, this might take a while
2024-08-06 00:36:22.478 [INF] GRDN: Starting main custodian event loop
2024-08-06 00:36:22.490 [INF] FRTR: Starting ChainPorter
2024-08-06 00:36:22.491 [INF] UNIV: Starting FederationEnvoy
2024-08-06 00:36:23.048 [INF] UNIV: Adding new Universe server to Federation, addrs=([]universe.ServerAddr) (len=1 cap=1) {
 (universe.ServerAddr) {
  ID: (int64) 0,
  addrStr: (string) (len=32) "universe.lightning.finance:10029",
  addr: (net.Addr) <nil>
 }
}

We see some later activity as well once litd has been running for a bit of time, although with some kinda connection error:

2024-08-06 11:49:46.980 [INF] UNIV: Attempting to sync universe: host=universe.tiramisuwallet.com, sync_type=full, ids=([]universe.Identifier) <nil>

2024-08-06 11:49:46.981 [INF] UNIV: Fetching 0 roots
2024-08-06 11:49:46.981 [INF] UNIV: Obtained 0 roots from remote Universe server
2024-08-06 11:49:46.981 [WRN] GRPC: [core] [Channel #558 SubChannel #559] grpc: addrConn.createTransport failed to connect to {Addr: "142.93.118.183:10029", ServerName: "142.93.118.183:10029", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 142.93.118.183:10029: operation was canceled"
2024-08-06 11:49:47.025 [INF] UNIV: Fetching 0 roots
2024-08-06 11:49:47.025 [INF] UNIV: Obtained 0 roots from remote Universe server
2024-08-06 11:49:47.025 [WRN] GRPC: [core] [Channel #560 SubChannel #562] grpc: addrConn.createTransport failed to connect to {Addr: "52.23.192.176:10029", ServerName: "52.23.192.176:10029", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 52.23.192.176:10029: operation was canceled"
2024-08-06 11:49:47.025 [INF] UNIV: Fetching 0 roots
2024-08-06 11:49:47.025 [INF] UNIV: Obtained 0 roots from remote Universe server
2024-08-06 11:49:47.025 [WRN] GRPC: [core] [Channel #561 SubChannel #563] grpc: addrConn.createTransport failed to connect to {Addr: "44.230.212.183:10029", ServerName: "44.230.212.183:10029", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 44.230.212.183:10029: operation was canceled"

Meanwhile tapcli is returning this error (lncli working just fine for getinfo):

./tapcli --network=mainnet getinfo
[tapcli] rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:10029: connect: connection refused"

And then when litd is terminated, I do see an odd error message about tapd refusing to start:

024-08-06 11:55:42.629 [ERR] STAT: could not start the taproot-assets sub-server: received critical error from sub-server (taproot-assets), shutting down: unable to receive new block notifications: rpc error: code = Canceled desc = grpc: the client connection is closing

When I run tapd daemon independently and call tapcli it works just fine. Configs are the same for Tapd.conf and Litd.conf.

Expected behavior

tapcli getinfo to respond

Actual behavior

Described above

To reproduce

Every time I restart

System information

taproot-assets.network=mainnet taproot-assets.debuglevel=debug taproot-assets.lnd.host=localhost:10009 taproot-assets.lnd.macaroonpath=[redacted]/bitcoin/mainnet/admin.macaroon taproot-assets.lnd.tlspath=[redacted]/tls.cert taproot-assets.rpclisten=localhost:10029 taproot-assets.restlisten=localhost:8089 taproot-assets.tlsautorefresh=true

Roasbeef commented 1 month ago

What if you try to remove the universe.tiramisuwallet.com universe server? It looks like it was unable to do a DNS query somewhere there. Ofc, that should be updated to be more robust there.

habibitcoin commented 1 month ago

@Roasbeef so I ran tapd independently, then was able to remove universe.tiramisuwallet.com

Not sure if this log line is relevant/interesting btw from the tapd independent client:

2024-08-06 14:53:31.941 [INF] RPCS: RPC server listening on 127.0.0.1:10029
2024-08-06 14:53:31.941 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unimplemented desc = unknown method HtlcModifier for service invoicesrpc.Invoices
2024-08-06 14:53:31.946 [INF] CONF: Starting HTTPS REST proxy listener at 127.0.0.1:8089
2024-08-06 14:53:31.946 [INF] RPCS: gRPC proxy started at 127.0.0.1:8089
2024-08-06 14:53:31.946 [INF] TSVR: Taproot Asset Daemon fully active!

But anyhow, tapcli independently worked to remove the universe:

lightning-terminal-darwin-amd64-v0.13.3-alpha % ./tapcli --network=mainnet universe federation list                    
{
    "servers":  [
        {
            "host":  "universe.lightning.finance:10029",
            "id":  1
        }
    ]
}

... but no change in behavior. Still connection refused once I switch to ./litd

I can see the ports being binded for everything else:

litd      50602  MyPC   11u  IPv4 0x7457ba98fb016980      0t0  TCP 127.0.0.1:8443 (LISTEN)
litd      50602  MyPC   15u  IPv4 0x82614e49935ba30d      0t0  TCP 127.0.0.1:10009 (LISTEN)
litd      50602  MyPC   76u  IPv4 0x20680d9c065c45ab      0t0  TCP 127.0.0.1:8080 (LISTEN)
litd      50602  MyPC   94u  IPv6  0x6ee75acd9b95431      0t0  TCP *:9735 (LISTEN)
Roasbeef commented 1 month ago

What version of lnd are you running against?

Are you able to get a goroutine profile? For both the case with standalone lnd and with litd?

habibitcoin commented 1 month ago

@Roasbeef

./lncli getinfo                                              
{
    "version": "0.18.2-beta commit=lightning-terminal-v0.13.3-alpha",
    "commit_hash": "00b2f7ece37f0564a95e84a9da6cd6f2310f14ef",

Whats the easiest way to profile? I think I'll need to build the binary myself in that case if I'm not mistaken. Was having some issues with the build earlier so just downloaded the binary. But can give that another try since I want to try the taproot channels too

Roasbeef commented 1 month ago

To profile you just need to set the --profile flag for lnd: --lnd.profile=PORT when running in integrated mode: https://github.com/lightningnetwork/lnd/blob/master/docs/debugging_lnd.md#capturing-pprof-data-with-lnd

If you visit the localhost link, then you can view the goroutine dump, which we want here to see what's holding up start up.

Useful docs on diagnostic tools for Go programs generally: https://go.dev/doc/diagnostics

Roasbeef commented 1 month ago

@habibitcoin I think your config might actually be the problem.

Try to remove this section:

taproot-assets.lnd.host=localhost:10009
taproot-assets.lnd.macaroonpath=[redacted]/bitcoin/mainnet/admin.macaroon
taproot-assets.lnd.tlspath=[redacted]/tls.cert

In integrated mode, an in memory buffer socket is used rather than the normal TCP based connection.

habibitcoin commented 1 month ago

@Roasbeef seemed hopeful, but no luck there. Here is the goroutine profile: https://gist.github.com/habibitcoin/90c2a4754b52853130f5aef9bd85e3d6

Roasbeef commented 1 month ago

@habibitcoin did you also make the config changes?

Can you post your config after the changes?

EDIT: you should also remove the other RPC related settings for tapd as well. When running with litd a single listening port handles multi-plexing with all the daemons.

So you can get rid of these too:

taproot-assets.rpclisten=localhost:10029
taproot-assets.restlisten=localhost:8089
Roasbeef commented 1 month ago

In that latest profile, all the tapd related goroutines look fine.

Also when you profile again, can you visit the localhost URL then paste that output? It's a bit more detailed.

guggero commented 1 month ago

Are you sure there's no actual [ERR] message in the log with the cause? Could you paste the full log of a fresh startup please?

habibitcoin commented 1 month ago

Ayyy we got it lol

Was a misunderstanding/misconfiguration when using ./tapcli. Did not realize the rpc port was now shared + needed to use the same certificate location that is shown in ./lncli help

#./lncli help
[..]
GLOBAL OPTIONS:
   --rpcserver value          The host:port of LN daemon. (default: "localhost:10009") [$LNCLI_RPCSERVER]
[..]
      --tlscertpath value        The path to lnd's TLS certificate. (default: "/use/this/path/Lnd/tls.cert") [$LNCLI_TLSCERTPATH]
[..]

And then ./tapcli --network=mainnet --rpcserver=localhost:10009 --tlscertpath /use/this/path/Lnd/tls.cert getinfo worked!