Alternator's handling of max_concurrent_requests_per_shard is different from CQL (and confusing)

A user complained that max_concurrent_requests_per_shard parameter doesn't work on Alternator. When we tested it, we noticed that it does work - that's easiest to see when setting it to 0 and seeing all requests are rejected - but @nuivall discovered that:

I think I figured the difference between CQL and alternator shedding. In CQL we pass around deferred_action object until we finish writing response (including flush). While in alternator we decrease the counter before we even start sending the response. So to make it more correct (and also trigger faster) we need need to bring it closer to CQL implementation.

scylladb / scylladb

Alternator's handling of max_concurrent_requests_per_shard is different from CQL (and confusing) #19559