xray-tech / xorc-xray

Xray / OAM
Apache License 2.0
6 stars 1 forks source link

Timeout service #5

Open ojow opened 6 years ago

ojow commented 6 years ago

Timeout service

introduction

Xray 1.0

As described by Xray 1.0 doc:

The timeout service is used a scheduler by other services. Other services such as the Rule Engine and the Retry one depends on it.

The idea is that it's a standalone, distributed, fault-tolerant service communicating with external world through RabbitMQ. The goal of the service is to publish so called timer.Expired events at the specified point in time. The API is (quoting the 1.0 doc):

Rwait(x, unit): wait x unit of time. Default unit is seconds, available: minutes, hours. ScheduleTimerAt(timestamp): Wait until timestamp is reached. timestamp must be in seconds.

The 1.0 implementation uses Aerospike for fault-tolerance: all the timeout requests are stored there and are read in memory at the startup by scanning Aerospike.

Xray OAM

The proposed implementation is based on the prototype developed using Scala+Kafka Streams.

The required Kafka Streams components are:

The implementation is very simple: every request is stored in the local storage, Punctuator gets the oldest (up until now) requests from the store every, let's say, 100ms and publishes the timer.Expired events to the output topic while also deleting them from the local storage. Cancel requests are handled trivially by just deleting the records from the local storage.

Kafka Streams not only allows for easy and transparent implementation but also adds easily configurable delivery guarantees (including exactly once solution using Kafka transactions). The solution is automatically scalable because of consistent hashing approach used with Kafka topics (meaning that timeout requests for the same entity will be handled by the same node and resulting events will be published to the same partition).

The plan is to implement the same logic with Clojure, adding the Timeout Service a component of duct-tape/integrant. This is flexible enough to run it either as part of the Rule Engine process or a separate deployment.

scope and result

This is mostly just a port of the existing service with some improvements coming "for free" with Kafka Streams. In particular we don't aim to improve the granularity and maximum latency.

Expected deliveries:

ojow commented 6 years ago

Added wiki page: https://github.com/xray-tech/xray/wiki/Timeout-Service More details will be added later, when we decide with Andrew how to add sites and how to benchmark.