walmartlabs / bigben

BigBen - a generic, multi-tenant, time-based event scheduler and cron scheduling framework
Apache License 2.0
196 stars 53 forks source link
cassandra cron distributed events hazelcast

NOTICE:

This repository has been archived and is not supported.

No Maintenance Intended


NOTICE: SUPPORT FOR THIS PROJECT HAS ENDED

This projected was owned and maintained by Walmart. This project has reached its end of life and Walmart no longer supports this project.

We will no longer be monitoring the issues for this project or reviewing pull requests. You are free to continue using this project under the license terms or forks of this project at your own risk. This project is no longer subject to Walmart's bug bounty program or other security monitoring.

Actions you can take

We recommend you take the following action:

Forking and transition of ownership

For security reasons, Walmart does not transfer the ownership of our primary repos on Github or other platforms to other individuals/organizations. Further, we do not transfer ownership of packages for public package management systems.

If you would like to fork this package and continue development, you should choose a new name for the project and create your own packages, build automation, etc.

Please review the licensing terms of this project, which continue to be in effect even after decommission.

BigBen

BigBen is a generic, multi-tenant, time-based event scheduler and cron scheduling framework based on Cassandra and Hazelcast

It has following features:

Use cases

BigBen can be used for a variety of time based workloads, both single trigger based or repeating crons. Some of the use cases can be

Architectural Goals

BigBen was designed to achieve the following goals:

Design and architecture

See the blog published at Medium for a full description of various design elements of BigBen

Events Inflow

BigBen can receive events in two modes:

It is strongly recommended to use kafka for better scalability

Event Inflow diagram

inflow

Request and Response channels can be mixed. For example, the event requests can be sent through HTTP APIs but the event triggers (response) can be received through a Kafka Topic.

Event processing guarantees

BigBen has a robust event processing guarantees to survive various failures. However, event-processing is not same as event-acknowledgement. BigBen works in a no-acknowledgement mode (at least for now). Once an event is triggered, it is either published to Kafka or sent through an HTTP API. Once the Kafka producer returns success, or HTTP API returns non-500 status code, the event is assumed to be processed and marked as such in the system. However, for whatever reason if the event was not processed and resulted in an error (e.g. Kafka producer timing out, or HTTP API throwing 503), then the event will be retried multiple times as per the strategies discussed below

Event misfire strategy

Multiple scenarios can cause BigBen to be not able to trigger an event on time. Such scenarios are called misfires. Some of them are:

In any of these cases, the event is first retried in memory using an exponential back-off strategy.

Following parameters control the retry behavior:

If the event still is not processed, then the event is marked as ERROR. All the events marked ERROR are retried up to a configured limit called events.backlog.check.limit. This value can be an arbitrary amount of time, e.g. 1 day, 1 week, or even 1 year. E.g. if the the limit is set at 1 week then any event failures will be retried for 1 week after which, they will be permanently marked as ERROR and ignored. The events.backlog.check.limit can be changed at any time by changing the value in bigben.yaml file and bouncing the servers.

Event bucketing and shard size

BigBen shards events by minutes. However, since it's not known in advance how many events will be scheduled in a given minute, the buckets are further sharded by a pre defined shard size. The shard size is a design choice that needs to be made before deployment. Currently, it's not possible to change the shard size once defined.

An undersized shard value has minimal performance impact, however an oversized shard value may keep some machines idling. The default value of 1000 is good enough for most practical purposes as long as number of events to be scheduled per minute exceed 1000 x n, where n is the number of machines in the cluster. If the events to be scheduled are much less than 1000 then a smaller shard size may be chosen.

Multi shard parallel processing

Each bucket with all its shards is distributed across the cluster for execution with an algorithm that ensures a random and uniform distribution. The following diagram shows the execution flow.
shard design

Multi-tenancy

Multiple tenants can use BigBen in parallel. Each one can configure how the events will be delivered once triggered. Tenant 1 can configure the events to be delivered in kafka topic t1, where as tenant 2 can have them delivered via a specific http url. The usage of tenants will become more clearer with the below explanation of BigBen APIs

Docker support

BigBen is dockerized and image (bigben) is available on docker hub. The code also contains scripts, which start cassandra, hazelcast and app. To quickly set up the application for local dev testing, do the following steps:

  1. git clone $repo
  2. cd bigben/build/docker
  3. execute ./docker_build.sh
  4. start cassandra container by executing ./cassandra_run.sh
  5. start app by executing ./app_run.sh
  6. To run multiple app nodes export NUM_INSTANCES=3 && ./app_run.sh
  7. wait for application to start on port 8080
  8. verify that curl http://localhost:8080/ping returns 200
  9. Use ./cleanup.sh to stop and remove all BigBen related containers

Non-docker execution

BigBen can be run without docker as well. Following are the steps

  1. git clone $repo
  2. cd bigben/build/exec
  3. execute ./build.sh
  4. execute ./app_run.sh

Env properties

You can set the following environment properties

  1. APP_CONTAINER_NAME (default bigben_app)
  2. SERVER_PORT (default 8080)
  3. HZ_PORT (default 5701)
  4. NUM_INSTANCES (default 1)
  5. LOGS_DIR (default bigben/../bigben_logs)
  6. CASSANDRA_SEED_IPS (default $HOST_IP)
  7. HZ_MEMBER_IPS (default $HOST_IP)
  8. JAVA_OPTS

How to override default config values?

BigBen employs an extensive override system to allow someone to override the default properties. The order of priority is system properties > system env variables > overrides > defaults The overrides can be defined in config/overrides.yaml file. The log4j.xml can also be changed to change log behavior without recompiling binaries

How to setup Cassandra for BigBen?

Following are the steps to set up Cassandra:

  1. git clone the master branch
  2. Set up a Cassandra cluster
  3. create a keyspace bigben in Cassandra cluster with desired replication
  4. Open the file bigben-schema.cql and execute cqlsh -f bigben-schema.cql

APIs

cluster

GET /events/cluster

tenant registration

A tenant can be registered by calling the following API

POST /events/tenant/register

fetch all tenants:

GET /events/tenants

event scheduling

POST /events/schedule

Payload - List<EventRequest>

EventRequest schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "id": {
      "type": "string"
    },
    "eventTime": {
      "type": "string",
      "description": "An ISO-8601 formatted timestamp e.g. 2018-01-31T04:00.00Z"
    },
    "tenant": {
      "type": "string"
    },
    "payload": {
      "type": "string",
      "description": "an optional event payload, must NOT be null with deliveryOption = PAYLOAD_ONLY"
    },
    "mode": { 
      "type": "string",
      "enum": ["UPSERT", "REMOVE"],
      "default": "UPSERT",
      "description": "Use REMOVE to delete an event, UPSERT to add/update an event"
    },
    "deliveryOption": {
      "type": "string",
      "enum": ["FULL_EVENT", "PAYLOAD_ONLY"],
      "default": "FULL_EVENT",
      "description": "Use FULL_EVENT to have full event delivered via kafka/http, PAYLOAD_ONLY to have only the payload delivered"
    }
  },
  "required": [
    "id",
    "eventTime",
    "tenant"
  ]
}

find an event

GET /events/find?id=?&tenant=?

dry run

POST /events/dryrun?id=?&tenant=?

fires an event without changing its final status

cron APIs

coming up...