stargate / data-api

JSON document API for Apache Cassandra (formerly known as JSON API)
https://stargate.io
Apache License 2.0
13 stars 16 forks source link

Extensible guardrail framework for JSON API #388

Open jeffreyscarpenter opened 1 year ago

jeffreyscarpenter commented 1 year ago

Guardrails are a helpful concept for limiting harmful usage of a database or data service API. Reference CEP-3 for the Cassandra project as an example.

Since the JSON API shredding algorithm breaks documents up into many cell values and makes extensive use of Cassandra features including indexes, collections, LWTs, etc., we are establishing limits that apply at the JSON API level. Many are already implemented in DocumentLimitsConfig. (There are also a couple of values in OperationsConfig that seem like limits as well.)

We should provide a way to override this implementation. For example, in Astra DB deployments we will want an implementation that allows guardrails to have different values by tenant. It may make sense to design a more generalized guardrail framework that we can apply across multiple Stargate APIs in the future.

tatu-at-datastax commented 1 year ago

Existing config limits are overridable via system properties / ENV variables, FWTW, using standard Quarkus config override mechanism.

Difference between DocumentLimitsConfig and OperationsConfig is as per naming: former is about size/structure of "JSON" documents to insert/after-modify, latter about operation-level aspects (number of documents mostly).

maheshrajamani commented 1 year ago

Are these guardrails to be supported at tennat level? Then system properties may not be the way to go. FYI., currently in server less guardrail overrides maintained manually as a kubernetes resource.

tatu-at-datastax commented 1 year ago

I would be strongly against per-tenant guardrails at this point; vast complexity for unclear benefits. Put another way: finding useful global settings seem like a better way to go at first. They can also be changed (if we must) for dedicated kubes. But trying to provide these for shared kubes on per-tenant basis is orders of magnitude more complicated.

jeffreyscarpenter commented 1 year ago

We will consider this as a future post-GA improvement.

JeremiahDJordan commented 7 months ago

As the API is now GA we are immediately seeing cases where this would be very useful to do.