mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 991 forks source link

CPU consumption and DoS on calling get_outputs rpc #3791

Open aglkm opened 1 month ago

aglkm commented 1 month ago
  1. Call get_block rpc to ensure it can produce the expected output in time. curl -u grin:secret --data '{"jsonrpc":"2.0","method":"get_block","params":[2824162, null, null],"id":1}' 127.0.0.1:3413/v2/foreign
  2. Call get_outputs rpc. curl -u grin:secret --data '{"jsonrpc":"2.0","method":"get_outputs","params":[["08b7e57c448db5ef25aa119dde2312c64d7ff1b890c416c6dda5ec73cbfed2edea"], null, null, true, true],"id":1}' 127.0.0.1:3413/v2/foreign
  3. Check CPU consumption.
  4. Call get_block rpc again and notice long time response until get_outputs finishes its job. curl -u grin:secret --data '{"jsonrpc":"2.0","method":"get_block","params":[2824162, null, null],"id":1}' 127.0.0.1:3413/v2/foreign
yeastplume commented 4 weeks ago

Thanks for reporting, just a quick video below of me running these commands against a docker image using a monitoring utility I've been putting together:

https://github.com/mimblewimble/grin/assets/7074070/2945ae56-a86a-45af-b738-cced17bd19b7

As you can see, the call to get_outputs is taking a couple of seconds (which definitely could be looked into to see how to speed it up), but it doesn't seem anywhere close to DoS territory from a single request.

Is the node in the middle of syncing when you're performing this call? Also, if you could give me as much detail about the machine the node is running on, including as much detail about processor specs, ram, disk etc.

aglkm commented 4 weeks ago

If you want to re-create a longer period of time waiting from the get_outputs, then send the rpc to grinnode public node API. I was able to re-create the same long waiting time against my archival node. My guess, this depends on whether a node is archival or not, causing deep searching over the outputs comparing to non archival node.

Also, even with shorter waiting time periods from get_outputs, it still causing the issue by locking(?) the db, making node struggling to sync or handle other rpc requests, e.g. get_block. So, if someone constantly sending you get_outputs one after another (not even parallel requests are needed), your node is not able to be in sync with the network or handle certain other requests, obviously this is way worse for the archival node.

Some other observations regarding the issue:

this is blocking the db:

curl -u grin:secret --data '{"jsonrpc":"2.0","method":"get_outputs","params":[["08b7e57c448db5ef25aa119dde2312c64d7ff1b890c416c6dda5ec73cbfed2edea"], null, null, true, true],"id":1}' 127.0.0.1:3413/v2/foreign

this is not blocking the db:

curl -u grin:secret --data '{"jsonrpc":"2.0","method":"get_outputs","params":[null, 1, 2833321, true, true],"id":1}' 127.0.0.1:3413/v2/foreign

yeastplume commented 4 weeks ago

Okay, thank you for the extra information, I'll profile to see exactly what's going on with this call

yeastplume commented 3 weeks ago

See fix in #3792, once again thanks for bringing this to our attention