wazuh / wazuh-indexer-plugins

GNU Affero General Public License v3.0
1 stars 3 forks source link

Implement the job-scheduler logic #87

Open AlexRuiz7 opened 1 month ago

AlexRuiz7 commented 1 month ago

Description

As part of the command manager plugin development and in continuation of #65, we are going to implement the job-scheduler logic to prioritize the commands and send them to the Wazuh Server's Management API.

Plan

Functional requirements

f-galland commented 3 weeks ago

Other plugins seem to interface with JobScheduler through its Service Provider Interface:

f-galland commented 3 weeks ago

It looks like the Plugin class (the main class inheriting from OpenSearch's Plugin) needs to implement JobSchedulerExtension.

f-galland commented 3 weeks ago

A separate class implements ScheduledJobRunner's runJob() which pushes the task to its own thread:

A javadoc in this class reads as follows:

 * The job runner class for scheduling async query.
 *
 * <p>The job runner should be a singleton class if it uses OpenSearch client or other objects
 * passed from OpenSearch. Because when registering the job runner to JobScheduler plugin,
 * OpenSearch has not invoked plugins' createComponents() method. That is saying the plugin is not
 * completely initialized, and the OpenSearch {@link org.opensearch.client.Client}, {@link
 * ClusterService} and other objects are not available to plugin and this job runner.
 *
 * <p>So we have to move this job runner initialization to {@link Plugin} createComponents() method,
 * and using singleton job runner to ensure we register a usable job runner instance to JobScheduler
 * plugin.
f-galland commented 3 weeks ago

The SQL plugin uses a model class for scheduled jobs which implements ScheduledJobParameter from JobScheduler:

AlexRuiz7 commented 3 weeks ago

That research was already performed in #65

f-galland commented 2 weeks ago

65 's PR only added job-scheduler to the command manager's gradle task. Job scheduler classes are not really being used over there.

f-galland commented 2 weeks ago

SampleExtensionRestHandler:

SampleExtensionPlugin:

SampleJobParameter:

SampleJobRunner:

It seems like the only proper way to schedule tasks using the job scheduler is to store them as documents to an index.

This is evidenced by the fact that the only call of runJob comes from the reschedule() method from the JobScheduler class. The job parameters to this runJob() call can be traced back to the sweep() method from the JobSweeper class in turn. Lastly, the sweep() method seems to parse the job parameters from a provided index.

f-galland commented 1 week ago

Search results pagination can be achieved by means of two distinct methods:

  1. Using SearchSourceBuilder's from() and size() which appear to be meant for user facing interfaces
  2. Using Scroll and other related classes.

Solution 2 seems more robust (and is suggested for larger data batches).

I'm researching how official plugins handle iterating over the search result pages without blocking execution.

We have used the provided ThreadPool for this in past tests alongside simple while loops, but there seem to be more elegant solutions: