The timeout service is used a scheduler by other services. Other services such as the Rule Engine and the Retry one depends on it.
The idea is that it's a standalone, distributed, fault-tolerant service communicating with external world through RabbitMQ. The goal of the service is to publish so called timer.Expired events at the specified point in time.
The API is (quoting the 1.0 doc):
Rwait(x, unit): wait x unit of time. Default unit is seconds, available: minutes, hours.
ScheduleTimerAt(timestamp): Wait until timestamp is reached. timestamp must be in seconds.
The 1.0 implementation uses Aerospike for fault-tolerance: all the timeout requests are stored there and are read in memory at the startup by scanning Aerospike.
the request KStream - basically it's a stream of requests from a Kafka topic
local persistent key-value store (backed by RocksDB by default, easily replaceable with open API) to store and sort the timer requests
Punctuator - the Kafka Stream abstraction for internal scheduling
The implementation is very simple: every request is stored in the local storage, Punctuator gets the oldest (up until now) requests from the store every, let's say, 100ms and publishes the timer.Expired events to the output topic while also deleting them from the local storage. Cancel requests are handled trivially by just deleting the records from the local storage.
Kafka Streams not only allows for easy and transparent implementation but also adds easily configurable delivery guarantees (including exactly once solution using Kafka transactions). The solution is automatically scalable because of consistent hashing approach used with Kafka topics (meaning that timeout requests for the same entity will be handled by the same node and resulting events will be published to the same partition).
The plan is to implement the same logic with Clojure, adding the Timeout Service a component of duct-tape/integrant. This is flexible enough to run it either as part of the Rule Engine process or a separate deployment.
scope and result
This is mostly just a port of the existing service with some improvements coming "for free" with Kafka Streams. In particular we don't aim to improve the granularity and maximum latency.
Expected deliveries:
Rwait(x, unit) and ScheduleTimerAt(timestamp) sites should be available for Orc programs
A Wiki page describing the service and the sites
Timeout service providing the sites should be implemented as part of the architecture of Clojure-based Rule Engine
Benchmarking and cost analytics results for the service should be included in the related report
Timeout service
introduction
Xray 1.0
As described by Xray 1.0 doc:
The idea is that it's a standalone, distributed, fault-tolerant service communicating with external world through RabbitMQ. The goal of the service is to publish so called
timer.Expired
events at the specified point in time. The API is (quoting the 1.0 doc):The 1.0 implementation uses Aerospike for fault-tolerance: all the timeout requests are stored there and are read in memory at the startup by scanning Aerospike.
Xray OAM
The proposed implementation is based on the prototype developed using Scala+Kafka Streams.
The required Kafka Streams components are:
KStream
- basically it's a stream of requests from a Kafka topicPunctuator
- the Kafka Stream abstraction for internal schedulingThe implementation is very simple: every request is stored in the local storage,
Punctuator
gets the oldest (up untilnow
) requests from the store every, let's say, 100ms and publishes thetimer.Expired
events to the output topic while also deleting them from the local storage. Cancel requests are handled trivially by just deleting the records from the local storage.Kafka Streams not only allows for easy and transparent implementation but also adds easily configurable delivery guarantees (including
exactly once
solution using Kafka transactions). The solution is automatically scalable because of consistent hashing approach used with Kafka topics (meaning that timeout requests for the same entity will be handled by the same node and resulting events will be published to the same partition).The plan is to implement the same logic with Clojure, adding the Timeout Service a component of duct-tape/integrant. This is flexible enough to run it either as part of the Rule Engine process or a separate deployment.
scope and result
This is mostly just a port of the existing service with some improvements coming "for free" with Kafka Streams. In particular we don't aim to improve the granularity and maximum latency.
Expected deliveries:
Rwait(x, unit)
andScheduleTimerAt(timestamp)
sites should be available for Orc programsA Wiki page describing the service and the sites
Timeout service providing the sites should be implemented as part of the architecture of Clojure-based Rule Engine
Benchmarking and cost analytics results for the service should be included in the related report
[ ] - Expose timeout service metrics