Distributed Caching implementation

aaguiarz commented 1 year ago

OpenFGA has a per-node cache (https://github.com/openfga/roadmap/issues/38).

It is possible to improve performance by taking advantage of a distributed caching or an external cache service.

lorenlew commented 11 months ago

Hi, on this page there is a statement "To further reduce latency, Auth0 FGA is configured to cache certain evaluations for a brief period of time."

Does it mean that only "SaaS openFGA" has built-in caching but it is not part of the opensource setup?

How do you see a priority for this particular roadmap ticket? It looks like caching is an important part of a setup. What is a recommended way to use the openFGA API in this context, to cache the results on the client side?

aaguiarz commented 10 months ago

Hi @lorenlew

We are actually working on initial iteration of a cache for OpenFGA as part of this work https://github.com/openfga/roadmap/issues/38.

The SaaS product uses DynamoDB Accelerator for caching purposes, and we don't have that available in OpenFGA as it can use other database backends.

lorenlew commented 10 months ago

@aaguiarz thank you for a quick reply. Do you already have a rough idea, how the setup with caching might/would look like? Supporting distributed cache for kubernetes setup with multiple replicas of openFGA service or some kind of "watcher" mechanism to synchronize local cache between replicas of openFGA service?

aaguiarz commented 10 months ago

@lorenlew

The initial implementation will be a cached per node. Potentially, different nodes could have different cached value during the cache TTL.

The final implementation will be a distributed cache where each node caches a subset of problems, and we dispatch the queries to the node that owns the respective cache key.

We've seen customers have good results with our current non-cached implementation. What kind of load do you expect the system to have?

lorenlew commented 10 months ago

@aaguiarz sounds good, I've noticed some people mention in github issues about implementing the cache in their Facade "Permissions" Service (consumer of openFGA service) - anyways, if needed, this option is open. Was wondering if it could be achived in openFGA service at some point of time.

We only experimenting with our models and setup, no reliable metrics so far, only preliminary thoughts about potential risks of slow requests with not having "server side batch check" and several limitations for "ListObject" (no contextual tuples support) and Read (not working with more complex fga syntax and usersets).

You mentioned DynamoDB Accelerator , but I could not find in the configuration (helm chart schema) documentation of this option (only in-memory, postgres, mssql). Have I overlooked something? Have a great day!

aaguiarz commented 10 months ago

@lorenlew ListObjects does support contextual tuples. Read needs more filtering capabilities (https://github.com/openfga/roadmap/issues/33).

The Auth0 FGA product (hosted by Okta) uses DynamoDB as the database and DynamoDB Accelerator as its cache. The DynamoDB storage adapter was not open sourced. It's something we plan to do in the future.

lorenlew commented 10 months ago

@aaguiarz thank you for bringing light into it. I was relying on this documentation Seemt then it's outdated.

aaguiarz commented 10 months ago

Thanks for letting us know, I submitted a PR to fix that https://github.com/openfga/openfga.dev/pull/495.

Marekgr7 commented 8 months ago

Hi,

We have implemented OpenFGA in our project. We have some issues with performance of listObjects.

What about with the implementation of caching related to the lists ? If store is used across several applications - that would be great if the listObjects caching would be working on OpenFGA side instead of caching response from listObject by ourself.

aaguiarz commented 8 months ago

@Marekgr7 would you mind sharing more details about the performance issues you are seeing?

Feel free to email me directly to andres to aguiar at openfga.dev, if you can share the model, the tuples (or an idea of the number of tuples and distribution), the call you are making, and how openfga is configured (settings + database type/size + openfga node configuration), it would be great.

alechenninger commented 3 months ago

@aaguiarz I know OpenFGA does not support the Zookie concept, but would you consider introducing some version of that as you expand caching support? Even if not for the new enemy problem, having some kind of version token to ask for cache "at least as fresh" can also support cached read-after-write consistency, such as shortly after adding/updating relations, you're able to query knowing the new relation(s) will be considered.

aaguiarz commented 3 months ago

@alechenninger we are discussing options to do that, they are captured here https://github.com/openfga/roadmap/issues/54.

Can you share more about the concrete scenario where you'd use it?

alechenninger commented 3 months ago

@aaguiarz Sure! I think it can be summarized as "responsive graph mutation experiences."

For example, imagine a Google Drive model, and that I am moving files around folders. If I am relying on the graph as the source of truth, and I am served back cached results after the move operation, I would still see the file is in the folder I just moved it from, or not in the folder I moved it to. That UX is at worst confusing and at best slow or "unresponsive".

It could be worked around probably by always using consistent queries if possible, but that sacrifices scalability. You could also try and also store the file-folder mapping in your own data, and use that, but that is still going to be eventually consistent with the graph. I'm not sure you can really get around the need for consistency of the ACLs.

I could be wrong 🙂, so keep me honest.

openfga / roadmap

Distributed Caching implementation #21