jube-home / jube

Jube is an open-source software designed for monitoring transactions and events. It offers a range of powerful features including real-time data wrangling, artificial intelligence, decision making, and case management. Jube's exceptional performance is particularly evident in its application to fraud prevention and abuse detection scenarios.
https://www.jube.io
GNU Affero General Public License v3.0
29 stars 2 forks source link

Implement Redis Cache #32

Closed richard-churchman closed 5 months ago

richard-churchman commented 9 months ago

PostgreSQL is used for the purpose of caching transaction data currently, and while it cannot be considered to be an in memory database, the shared buffers means that read performance, while slower than an in memory cache, it is not to an extent that materially affects response time when traded off against the durability guarantees provided by PostgreSQL. Read performance aside, in memory databases are extremely expensive to run, a nightmare to administer and demand a degree of duplication - in Jube at least - given its key value pair access paths (while PostgreSQL queries are indexed on multiple keys, these keys would instead be duplicated, with transaction history being stored in its value HSET).

There is no contest in writes, and Jube response times are severely impacted. For example, reading Abstraction keys overall might take 3ms to 6ms, writing might be 17 ms, which is hard to defend in a real time system. Currently writes to the cache are made in the transaction flow, which is important, as serialisation across requests is required. Ideally all writes would be moved out to background threads performing bulk inserts, but this would not provide for the serialisation guarantees from transaction to transaction (consider a velocity rule made up of counts on the PostgreSQL cache). Turning asynchronous commit on allows some relief, but without moving to UNLOGGED tables (which attract their own problems) it still does not come close to write performance of desirable.

Redis will be implemented as an in memory database as follows.

In respect to the Abstraction cache:

In respect to TTL Counter:

In respect to cached Abstraction values:

The functionality will be optional and in the absence of a Redis cache being available, existing functionality will prevail.

Connection strings to Redis should be contained at the application level and fully support multiple endpoints such that FAILOVER can be invoked to resize Redis instances.

richard-churchman commented 7 months ago

31 has largely prepared for the development of this issue.

richard-churchman commented 7 months ago

To kick off the project the first step is to document the key names and patterns (may as well put this into markdown documentation). For example, for a payload entry:

KeyName:TenantId:EntityAnalysisModelId:Key:Value

Where in the Hash Key:

CreatedDateTimestamp:ReferenceDateTimestamp:EntityAnalysisModelInstanceEntryGuid:Bytes

So, selecting in an indexed manner might be:

Payload:1:1:AccountId:123456

And the HSET key has several helpful values such that the HSET can be itterated and filtered for the most common cases with going to the MessagePack byte array:

1713689193:1713689193:dc77732f-c211-435e-82eb-5211d2631105:690

The above values are promoted largely as there is subsequent filtering that will take place by iteration. Bytes are held so it is possible to iterate keys and identify tenants and models which are using the most space, which might be helpful if we ever get around to creating a cloud offer, in which case the Redis and Postgres overhead will need to be charged for.

For every Cache class in the application, start up creating such a key model.

It is supposed that the use of unix Timestamp will be more efficient given the amount of filtering based on date, but will test.