Acceptable for load balancing use case?

jamesmunns commented 3 weeks ago

Hey there! I'm thinking about using ARCache as the backing store for a concurrent reverse proxy rate limiter. The idea would be to put a leaky bucket implementation (something like the leaky-bucket crate - basically a time/rate limited FIFO queue) as the "Value", and to use a key such as the URI path for rate limiting.

As far as I can tell, I should do something like:

Make an ARCache
Use a Read transaction to see if the key exists, e.g. get("example-uri-path")
If I get a None back, do an insert("example/uri/path", Arc::new(LeakyBucket::new()))

However this language has me a little worried:

It is also invalid for you to insert a value that does not match the source-of-truth state, IE inserting a different value than another thread may perceive.

The situation I'm worried about is that two threads simultaneously do this operation, and both insert a new leaky bucket. I don't actually care which one wins (and becomes durable in the cache on serialized commit), there will be a slight chance this allows marginally more requests than it should, but I think this is generally acceptable for me, as long as it doesn't cause overly spooky behavior.

In my case, there's no actual "source of truth", instead I want LRU-cache-like behavior to limit the total number of bucket entries to avoid a denial of service by unbounded allocation (e.g. someone creates a silly large number of requests to different URI paths), and also avoid expensive locks to do reclamation/eviction of "stale" records (e.g. a specific URI hasn't been requested in a long time).

Is there anything else I should be careful if I want "get or insert" behavior in general, where the "insert" part might end up being concurrent?

Thanks!

Also by the way, I was looking around, found the hashlru crate, which linked to lru-rs, which linked to https://github.com/jeromefroe/lru-rs/issues/21 which is where I found concread.

Firstyear commented 3 weeks ago

tl;dr - In this case here, it's safe.

The "spooky language" is for when you have a database behind ARCache. Read transactions internally take a generation counter, so if two read txns have the same generation ID, but those transactions then submit content that is different ie Key:1 Data:A vs Data:B, then that will "taint" that one or the other data will survive. This is what I mean by the source of truth - whatever you commit needs to match the backend so that other threads of the same generation will get the same data as the database has. There are many more "spooky" cases, but generally this is about temporal consistency.

In your case, because you you want to just submit a new empty bucket, then provided you are okay with one or the other data being lost on the cache quiesce, then you're golden.

In otherwords - I think you're all good here. The warning is for caching of a database that is behind the cache, and that transactions need to be paired to the database itself.

jamesmunns commented 3 weeks ago

Perfect, thank you! Started working on it here: https://github.com/memorysafety/river/pull/67

Firstyear commented 3 weeks ago

Nice! Let us know if you have any comments or feedback, happy to always help out.

kanidm / concread

Acceptable for load balancing use case? #124