mantiumai / chirps

Discover sensitive/confidential information stored in a vector database
GNU General Public License v3.0
57 stars 6 forks source link

Scan Schedules #184

Open zimventures opened 1 year ago

zimventures commented 1 year ago

Scans are currently kicked off by a user manually clicking on the "play" button in the dashboard. As a user, it would be useful to have a scan automatically run at a specified time, or frequency, in the future.

Schedule Types

There will be three types of schedules available to the end user: one-shot, interval, and cron schedules.

One-Shot Schedule

It would be useful to allow a user to schedule a single scan at some point in the future. It will only run once. A Celery task can be kicked off at some point in the future by passing in the eta parameter to the apply_async(). Reference here.

Interval Schedule

An interval schedule is one that is run at a pre-defined time interval. This functionality is supported in Celery by employing the IntervalSchedule class.

Examples:

Cron Schedule

A cron schedule is one that offers maximum flexibility to the user. A cron schedule allows a scan to be run at a specific date/time, repeated. This functionality is provided via the CrontabSchedule class.

Examples:

Scan idempotency

A configured scan can only be running one copy of itself at any given time. That means if a scheduled scan is currently running, the user can not kick off a manual invocation of that scan. In addition, an automatic scan can not run if a manual scan is running.

To support skipping over a scheduled scan due to a manual scan running, a new state will be added to ScanRun: Skipped. If a scan is already running when the scan_scheduler() is kicked off, a ScanRun object will still be created, but will be marked as skipped.

Implementation

A new application will be added to the project: schedule. This application will house all of the models and views for the schedule functionality. It's worth noting that any one-shot functionality will remain within the scan application.

User Interface

A new top-level menu item will be added: "Schedules". Clicking that will take the user to the schedules dashboard. The dashboard will present a paginated list of all the configured schedules. Each schedule item in the dashboard list will display the following:

There will be a new view for both creating and editing a schedule. This will display the options for setting up EITHER an interval schedule or cron schedule. The schedule will be persisted when the scan is created or saved.

One-Shot Scheduling

The ability to kick off a one-time scan at some point in the future will be made via a button on the "Scan History" page. Clicking the button will show a modal that allows the user to pick a date and time in the future to run the scan at. The scheduled scan should show up in the scan history list so that it can be cancelled/deleted.

Manual vs. Scheduled Jobs

In the scan history, identify manual vs. scheduled jobs by displaying an icon.

Scan Dashboard & History Page

If a scan has a schedule applied to it, include a link to the schedule in the scan history page as well as in the dashboard list.

Backend

In order to make this integration smooth, the existing schema will largely be left alone and additional models will be created. Scheduled scans will simply create ScanRun instances when it's time to kick them off. Cron schedules will not create the ScanRun until the job is actually kicked off.

One-Shot scans

One-shot schedules will introduce a new field on the ScanRun model: one_shot_celery_id. This will store the ID of the Celery job that was queued for sometime in the future. A new task will be added to ./scan/tasks.py to handle kicking off a one-shot scan: one_shot_scan(). This task will take in the ID of the ScanRun to execute.

If an existing scan is currently running, the ScanRun will be marked as Skipped and no scan will be started. Otherwise, the scan will be started as per the logic found in scan/views.py:vcr_start().

ScanSchedule model

from django_celery_beat.models import CrontabSchedule, PeriodicTask, IntervalSchedule

class ScanSchedule(models.Model):
   scan = models.ForeignKey('ScanTemplate')
   interval = models.ForeignKey(IntervalSchedule, null=True, default=None)
   cron = models.ForeignKey(CrontabSchedule, null=True, default=None) 
   scheduled_task = models.ForeignKey(PeriodicTask)  

When the ScanSchedule model is created, a Celery PeriodicTask is created, pointing at the new scan_scheduler task.

scan_scheduler

Inside of scan/tasks.py a new scan_scheduler function will handle execution of a scheduled scan. It will be passed in the ID of the ScanSchedule model that should be kicked off. The logic for kicking off a scan will look identical to the logic found in scan/views.py:vcr_start().

In the event that a scan is already running, a ScanRun will be created and immediately set to the Skipped state - not kicking off any jobs.