Huge system load caused by watchtower.py

svewa commented 2 years ago

I have ~170 channels, and teos is using > 20 times the CPU of lightningd, stressing my system. Is this expected?

Just some data on the workload: just now within the last 100 minutes 2000 appointments were sent.

the log is 2000 times like:

2022-02-18T21:53:28.147Z INFO    plugin-watchtower.py: Adding tower to Tower's db (id=$id)
2022-02-18T21:53:28.188Z INFO    plugin-watchtower.py: Sending appointment 0e5f7de01a641a787918b37daa9e8a9f to $id
2022-02-18T21:53:29.096Z INFO    plugin-watchtower.py: Appointment accepted and signed by $id
2022-02-18T21:53:29.110Z INFO    plugin-watchtower.py: Remaining slots: 159722902
2022-02-18T21:53:29.121Z INFO    plugin-watchtower.py: Start block: 723944

sounds a bit weird to me that the Tower is added to the Towers db...

sr-gi commented 2 years ago

I have ~170 channels, and teos is using > 20 times the CPU of lightningd, stressing my system. Is this expected?

Is teos using that much CPU or the watchtower-plugin? In any case that would not be expected, but I'm curious how you are measuring it if it's the latter.

Just some data on the workload: just now within the last 100 minutes 2000 appointments were sent.

The way the plugin works is by sending data to the tower every time a channel is updated. That happens twice per payment received (or forwarded) and also if the channel fees are renegotiated. So if you're handling a lot of traffic it is expected that you're sending a lot of appointments to the tower.

sounds a bit weird to me that the Tower is added to the Towers db...

That should actually read Adding tower to Towers db (no appostrophe). That is, backing up the tower's info in the local towers.db of the plugin. It is certainly not the best wording, I'll update that log entry.

svewa commented 2 years ago

Is teos using that much CPU or the watchtower-plugin? In any case that would not be expected, but I'm curious how you are measuring it if it's the latter.

It's just the watchtower plugin, sorry for the lack of precision. I'm measuring it with top:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                              
 318721 cl        20   0  702368 291080   4964 S  61.5   6.9 679:40.40 python3                                                                                                                              
 318709 cl        20   0  413236 390308   4404 S   5.6   9.3  38:34.13 lightningd

318721/python3 is watchtower.py

And here I compare the total time consumed. Both process start roughly at the same time and very certainly wathtower.py did not start before lightningd. Watchtower had 680 minutes CPU time, lightningd only 38.

sounds a bit weird to me that the Tower is added to the Towers db...

That should actually read Adding tower to Towers db (no appostrophe). That is, backing up the tower's info in the local towers.db of the plugin. It is certainly not the best wording, I'll update that log entry.

Still not sure what is being done here. The tower is added to the Towers db? But was it not in there yet? Is the same tower added 2000 times, sounds redundant. Or is this appointment-specific data?

sr-gi commented 2 years ago

Oh yeah, the info is just being updated. That's what I meant by the bad wording. I guess is more of a dev vs user perspective. The plugin is using leveldb to store the towers info. In leveldb entries can only be updated as a whole, so even though here the record is only being appended an appointment (and decreasing a slot), the tower is being "stored".

Again, not a good choice of words.

Anyway, I'm quite curious what can be making the plugin spend that much CPU time. All the plugin should be doing for every update is:

Encrypt a transaction using CHACHA20
Create, serialize and sign the appointment using ECDSA
Send the data to the tower
Wait for the response
Verify the received data (including a signature from the tower)
Store the data in the database

My hunch is that this may be related to long comms times between the client and the tower, given before #304 some of your requests were timing out after 5 secs. I'm pretty sure that wait is a synchronous process. Now the timeout is set to 30 so the request that hang a bit may do so for longer times.

A way of knowing if this could be the case should be to check in the logs how long it took from lightning issuing data to the commitment_revocation_hook to the tower logging Adding tower to Tower's db. That would account for the whole plugin iteration, but the rest of the tasks should be fairly constant between updates. I can also add a couple of debug log lines, one before sending data to the tower and another after receiving the response to measure the time spans if you're interested.

In any case, I'm currently reworking the whole thing in Rust, both the tower and hopefully the plugin sine I'm sure Python's performance is not helping either.

svewa commented 2 years ago

I'm just looking a bit more on performance data, and this is what keeps showing: Massive writes of 50-70MB/s by watchtower.

Total DISK READ:        29.41 K/s | Total DISK WRITE:        70.96 M/s
Current DISK READ:      12.55 K/s | Current DISK WRITE:     142.88 M/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                    
    492 be/4 root        0.00 B/s    0.00 B/s  0.00 % 26.57 % [txg_sync]
3531382 be/4 101000      0.00 B/s   43.72 M/s  0.00 % 22.87 % python3 /home/cl/src/python-teos/watchtower-plugin/watchtower.py
3529710 be/4 101000      8.73 K/s  474.98 K/s  0.00 % 17.58 % lightningd
3529722 be/4 101000    100.39 B/s   26.68 M/s  0.00 %  9.03 % python3 /home/cl/src/python-teos/watchtower-plugin/watchtower.py

I checked on teosd, which is only doing 1-3MB/s.

Why does the client-side database grow so much, once an appointment is committed to the tower, is it still being stored on the client?

Here is also the tail of my towers/LOG:

2022/02/19-08:39:42.296261 140522914227968 Delete type=0 #696370
2022/02/19-08:39:42.296756 140522914227968 Compacting 4@0 + 0@1 files
2022/02/19-08:39:43.075449 140522914227968 Generated table #696374@0: 1 keys, 42552337 bytes
2022/02/19-08:39:43.075488 140522914227968 Compacted 4@0 + 0@1 files => 42552337 bytes
2022/02/19-08:39:43.087320 140522914227968 compacted to: files[ 0 1 1 0 0 0 0 ]
2022/02/19-08:39:43.087419 140522914227968 Delete type=2 #696366
2022/02/19-08:39:43.087977 140522914227968 Delete type=2 #696371
2022/02/19-08:39:43.088413 140522914227968 Delete type=2 #696369
2022/02/19-08:39:43.091147 140522914227968 Delete type=2 #696373
2022/02/19-08:39:43.096264 140522914227968 Compacting 1@1 + 1@2 files
2022/02/19-08:39:44.307280 140522954024768 Current memtable full; waiting...
2022/02/19-08:39:44.366244 140522914227968 Generated table #696375@1: 1 keys, 42552337 bytes
2022/02/19-08:39:44.366331 140522914227968 Level-0 table #696377: started
2022/02/19-08:39:45.152336 140522914227968 Level-0 table #696377: 42552481 bytes OK
2022/02/19-08:39:45.210917 140522914227968 Delete type=0 #696372
2022/02/19-08:39:45.211557 140522914227968 Compacted 1@1 + 1@2 files => 42552337 bytes
2022/02/19-08:39:45.218963 140522914227968 compacted to: files[ 1 0 1 0 0 0 0 ]
2022/02/19-08:39:45.219067 140522914227968 Delete type=2 #696367
2022/02/19-08:39:45.221166 140522914227968 Delete type=2 #696374
2022/02/19-08:39:45.224900 140522914227968 Level-0 table #696379: started
2022/02/19-08:39:46.043766 140522914227968 Level-0 table #696379: 42552625 bytes OK
2022/02/19-08:39:46.061382 140522914227968 Delete type=0 #696376
2022/02/19-08:39:46.372317 140522914227968 Level-0 table #696381: started
2022/02/19-08:39:47.143062 140522914227968 Level-0 table #696381: 42552769 bytes OK
2022/02/19-08:39:47.157281 140522914227968 Delete type=0 #696378
2022/02/19-08:39:48.058411 140522914227968 Level-0 table #696383: started
2022/02/19-08:39:48.988933 140522914227968 Level-0 table #696383: 42552913 bytes OK
2022/02/19-08:39:49.017397 140522914227968 Delete type=0 #696380
2022/02/19-08:39:49.018021 140522914227968 Compacting 4@0 + 0@1 files
2022/02/19-08:39:49.948024 140522914227968 Generated table #696384@0: 1 keys, 42552913 bytes
2022/02/19-08:39:49.948077 140522914227968 Level-0 table #696386: started
2022/02/19-08:39:50.860789 140522914227968 Level-0 table #696386: 42553057 bytes OK
2022/02/19-08:39:50.881075 140522914227968 Delete type=0 #696382
2022/02/19-08:39:50.882189 140522914227968 Compacted 4@0 + 0@1 files => 42552913 bytes
2022/02/19-08:39:50.888332 140522914227968 compacted to: files[ 1 1 1 0 0 0 0 ]
2022/02/19-08:39:50.888460 140522914227968 Delete type=2 #696379
2022/02/19-08:39:50.888585 140522914227968 Delete type=2 #696377
2022/02/19-08:39:50.890535 140522914227968 Delete type=2 #696383
2022/02/19-08:39:50.898389 140522914227968 Delete type=2 #696381
2022/02/19-08:39:50.900573 140522914227968 Compacting 1@1 + 1@2 files
2022/02/19-08:39:52.149996 140522914227968 Generated table #696387@1: 1 keys, 42552913 bytes
2022/02/19-08:39:52.150058 140522914227968 Level-0 table #696389: started

A way of knowing if this could be the case should be to check in the logs how long it took from lightning issuing data to the commitment_revocation_hook to the tower logging Adding tower to Tower's db. That would account for the whole plugin iteration, but the rest of the tasks should be fairly constant between updates.

How would I get the commitment_revocation_hook timestamp to look into this?

sr-gi commented 2 years ago

I'm just looking a bit more on performance data, and this is what keeps showing: Massive writes of 50-70MB/s by watchtower.

I checked on teosd, which is only doing 1-3MB/s. Why does the client-side database grow so much, once an appointment is committed to the tower, is it still being stored on the client?

Ohhh, OK. Looks like this is 2-years-ago me being dumb.

The client side only stores the locators and tower signatures, since that's the only thing needed to prove that the tower agreed on something. However, as I mentioned before, leveldb records are stored as a whole, so in order to do so they are read, modified and re-written into disk. Since the record keeps growing it is more expensive to be read and written every time, hence the huge disk I/O.

In retrospect, this feels so dumb. This can be fixed by using something like SQLite or PostgreSQL to store the plugin's data, which is precisely what is being used in the Rust version.

I'll try to roll a patch next week, alongside a script to migrate data from leveldb to the new schema.

sr-gi commented 2 years ago

Just as a side note, this does not happen on the tower-side because that's where the database manager design comes from.

Entries on the tower-side are stored by appointment, and user's info is mainly updated without the need of appending much to it.

svewa commented 2 years ago

I thought so, after you mentioned the leveldb limitations before :) looking forward for the patch. while I see postgres being the right choice, I hope for sqlite, as this would allow me to further delay my migration to a proper database cluster - also for clightning :)

svewa commented 2 years ago

as you rewrite anyway, and also there is not much implemented yet to charge for service etc (and also I'm running the tower for myself) it would be sufficient for me to just not write the signature and forget about the appointment once it's accepted. might be easier than improving and migrating a deprecated solution.

sr-gi commented 2 years ago

That's actually part of the operation modes I wanted to introduce before I started porting this to Rust, the tower should be able to run in accountable and non-accountable mode.

The accountable is the one that's running now, and would be default, but in non-accountable the user won't store receipts and will "trust" what comes from the tower.

svewa commented 2 years ago

sounds reasonable, sure.

but also a thought on teosd: while 1-3MB/s writes is not terribly much, it also seems way too much considering the data it is receiving is maybe a few kB/s...

sr-gi commented 2 years ago

Indeed. That's also why I'm switching to SQL on the tower side too, but not planning on rolling a patch for that in this codebase anytime soon.

The first release for the Rust code is around the corner. Will ping you once it's ready if you're interested in giving it a try and see the difference in performance.

svewa commented 2 years ago

The first release for the Rust code is around the corner. Will ping you once it's ready if you're interested in giving it a try and see the difference in performance.

sure, just tell me when it's ready for testing!

What SQL backends are you planning to support? I see a huge shift to Postgres lately, but Maria is still a solid option imho.

sr-gi commented 2 years ago

Currently I'm supporting SQLite amb planning on also adding Postgres. The idea is to template the DBM so others can also be added.

sr-gi commented 2 years ago

as you rewrite anyway, and also there is not much implemented yet to charge for service etc (and also I'm running the tower for myself) it would be sufficient for me to just not write the signature and forget about the appointment once it's accepted. might be easier than improving and migrating a deprecated solution.

@svewa I've ended up going for a hotfix with your suggested solution since reworking the plugin to use SQL will require more work than I'd like to put into the plugin atm (given I'm going to have to re-write it from scratch anyway in the upcoming weeks). The change simply won't store any appointment data to the database once verified.

You can find the patch at https://github.com/talaia-labs/python-teos/tree/minimal-memory. I wont merge the patch into master since I don't want people to use it unintentionally.

You should be able to make it work by just:

Replacing the plugin files (there no need to install anything)
Stop your node
Rename you current towers.db (back it up just in case)
Start your node again and re-register with the tower. This will create a new towers.db that will not store appointment data

This last step is mainly so the plugin knows about the tower, the tower already knows about you.

This is far from ideal but should do the trick for now, let me know if anything does not work as expected.

svewa commented 2 years ago

Works perfect (extensive test of 5 minutes now). I didn't find "towers.db" but moved .watchtower/towers away, maybe that's what you meant anyway. I had the plugin stopped as IO consumption grew to 120MB/s and was draining my resources. Now it's smooth again. Thanks!

While I understand your concerns about pushing this on users, I also think many might be happy about the performance increase, as it might overload weak systems (rpi4 with sd card?). Admittedly this is only relevant for bigger nodes.

sr-gi commented 2 years ago

@svewa if you are still running a tower you may be interested in giving rust-teos a go, the cln plugin is about to get merged so feel free to test it and give any feedback / requests you'd like to see.

https://github.com/talaia-labs/rust-teos/pull/53 https://github.com/talaia-labs/rust-teos/pull/58

svewa commented 2 years ago

very nice, thanks for letting me know. I'll look into this once I find the time :)

talaia-labs / python-teos

Huge system load caused by watchtower.py #309