subsquid / worker-rs

GNU Affero General Public License v3.0
0 stars 1 forks source link

P2P controller issues #25

Open eldargab opened 1 week ago

eldargab commented 1 week ago

1

https://github.com/subsquid/worker-rs/blob/d26c62668164033f82d31920f690581e09d6b3cf/src/controller/p2p.rs#L156

And the query is silently dropped? No good!

2

https://github.com/subsquid/worker-rs/blob/d26c62668164033f82d31920f690581e09d6b3cf/src/controller/p2p.rs#L234

It is strange to ignore back pressure and to continue accept queries while there are troubles with sending them back!

Similar thing happens here: https://github.com/subsquid/worker-rs/blob/d26c62668164033f82d31920f690581e09d6b3cf/src/controller/p2p.rs#L139

When application is not able to process requests it should convey that to the transport level and to stop wasting resources on accepting and verifying packets that it is about to drop.

However, the problem is not just about queue puts.

I would implement request processing pipeline roughly as follows.

3

https://github.com/subsquid/worker-rs/blob/17bbf99c7cd7c5f6529d4d34f059cbf88842cfb2/src/controller/worker.rs#L104

No need for ownership.

4

max query size limit in the currently linked version of the transport lib is set to 512 kb.

pub const MAX_QUERY_SIZE: u64 = 512 * 1024;

It should be less.

The limit for the query itself should be set exactly and explicitly to 256 kb.

Transport message size should be adjusted accordingly.

For the future, allocation check should happen before message arrival and validation.

kalabukdima commented 1 week ago
  1. It is the code left from the old logs collection approach. It's fixed in #23
    • The first point is about the TransportHandle. Yes, it has a poor design and it caused a lot of trouble in the portal. I'm trying another approach in the logs collector and if it works well, I'll do the same with all other actors and get back with the results. Just note that this queue only sends messages to an internal coroutine that puts them into another queue. So it's even worse — if we have troubles sending results back, the worker's code won't even know about it.
    • Regarding the event processing, I believe it was your suggestion to not block in the event handling procedure and process it as fast as possible. Do you suggest blocking on sending an error response now? But then problems with queries will prevent other transport messages (like logs requests) from being processed.
  2. Good point!
  3. Restricting queries to 256 kB is something we've agreed on just recently. We're not even sure yet that it would be enough. Do you think allocating 2x space is an issue? The Vec implementation itself assumes it's fine to use 2x memory, so we should also go through all Vec usages and reserve the capacity in advance if this is the goal. For the query string itself, I'll add the explicit limit.