matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.91k stars 2.65k forks source link

Plugin API for Scheduled Tasks #1184

Closed robocoder closed 14 years ago

robocoder commented 14 years ago

Use one crontab entry to trigger Piwik archiving, daily report generation, bots, etc.

This plugin:

Updates the UI Settings 'general settings'

This plugin is not #817.

julienmoumne commented 14 years ago

Attachment: piwik-dev1 (#1184).patch

mattab commented 14 years ago

See also #587 which could allow triggering these cron tab like tasks from piwik.php requests in case users don't setup automatic crontabs.

If automatic crontab is setup (which can be automatically detected by Piwik), then cron tabs tasks are not triggered by piwik.php (see #587)

mattab commented 14 years ago

I believe we should update the documentation and have the crontab fire more regularly, ie. every 15 minutes, in case some plugins need to run tasks more frequently. The standard archiving task would only trigger after config.ini.php > time_before_today_archive_considered_outdated seconds.

mattab commented 14 years ago

5 and #53 are feature candidates for this hook

mattab commented 14 years ago

We need to think about the current archive.sh script and how it would be changed to accomodate this new hook (either call this plugin specifically, or change the way archive.sh work to make it call this plugin that would trigger archiving?). Note that it might be better to leave archive.sh with the current "looping over websites and periods" to archive them separately because otherwise, triggering all archives at once will result in memory issues for Piwik installs with hundreds/thousands websites.

mattab commented 14 years ago

Also, do we need system to enforce that such task can not be ran twice at the same time (a software (or DB?) level lock mechanism).

mattab commented 14 years ago

Sending email reports is also candidate for this hook, see for example PDF plugin #71

mattab commented 14 years ago

Implementation proposal

// pseudo code of function hooking on runTasks
function runOptimizeTables($notification)
{
   // run every Mondays at 2AM
   if( TaskScheduler.shouldRunTask( 'my task ID name', 'weekly' ))
   { 
        // execute task
   }
}

Note that we don't have minutes, because smaller possible granularity is the hour. (cron tabs are setup to run once per hour and probably should never run more often)

The difference between running scheduled tasks via cron or via piwik.php is that, it might be triggered more than once per hour (even though all requests to piwik.php will not trigger the Scheduled tasks, for obvious optimization reasons, only one random out of many will trigger scheduled tasks).

A solution to this issue is to plan for schedules ahead of time (process the time at which the task will run next). Then, when the task successfully runs, re-schedule it for next time (eg. next week for a weekly task)

pseudo code

function shouldTaskRun( taskID, interval, [ minimumTimestamp ] )
  if(minimumTimestamp > time()) return false;

  schedule = Piwik_GetOption('schedule')
  shouldRunTask = false;
  if(isset(schedule[taskID]))
  {
    // task already scheduled, run only if scheduled_time is > time()
    if(schedule[taskID]['scheduled_time'] > time())
    {
       shouldRunTask = true;
    } 
  }
  else
  {
     // new task, always run once first time cron is ran
     shouldRunTask = true;
  }

  // process next time at which should run
  nextScheduleTime = time() + (if hourly then 3600 elseif daily then 86400 etc.);
  schedule[taskID][scheduled_time] = nextScheduleTime;

  // record updated schedule in DB
  Piwik_SetOption('schedule', schedule);

  return shouldRunTask;

minimumTimestamp can be used to define exactly what time of day should tasks run.

For example, if one wants to run a daily job at 2AM, you would write in your plugin

if( TaskScheduler.shouldRunTask( 'my task ID name', 'weekly', mktime(2,0,0,date('m'),date('d'),date('Y'))  ))

What will happen is that, the first time the cron triggers after 2AM, this scheduled task will be allowed to run. ShouldRunTask will then process next time it should run, which is 2AM the next day.

Edge case: if the cron didn't run before 5AM (for some reasons), it will trigger the 2Am task. However you wouldnt want to schedule tomorrow's task at 5AM but at 2AM. You can use code such as

 now = time();
 interval = 86400; // for example
 nextScheduleTime = now + interval - ((now - minimumTimestamp) % $interval);

let me know if this makes sense, cheers

mattab commented 14 years ago

Note: inspired from WP implementation see http://phpxref.ftwr.co.uk/wordpress/nav.html?wp-includes/cron.php.html#wp_schedule_event

http://phpxref.ftwr.co.uk/wordpress/nav.html?wp-cron.php.html

while their implementation is over complicated, we can do the same thing in a few lines of code :)

julienmoumne commented 14 years ago

I'm ok with the proposal except for one bit.

I would like the implementation to be more object oriented.

There would be a Piwik_ScheduledTask, a Piwik_ScheduledTime.

Instead of having :

function getListHooksRegistered()
{
    return array(
        'TaskScheduler.getScheduledTasks' => 'runOptimizeTables',
    );
}

function runOptimizeTables($notification)
{
   // run every Mondays at 2AM
   if( TaskScheduler.shouldRunTask( 'my task ID name', 'weekly' ))
   { 
        // execute task
   }
}

it would be

function getListHooksRegistered()
{
    return array(
        'TaskScheduler.getScheduledTasks' => 'getScheduledTasks',
    );
}

function getScheduledTasks($notification)
{
    $scheduledTasks = &$notification->getNotificationObject();

    $tableOptimisationScheduledTime = Piwik_ScheduledTime::factory('weekly');
    $tableOptimisationScheduledTime->setDay('monday');
    $tableOptimisationScheduledTime->setHour(13);
    $tableOptimisationScheduledTime->setMinute(20);

    $scheduledTasks[] = new Piwik_ScheduledTask('runOptimizeTables', $tableOptimisationScheduledTime);

}

function runOptimizeTables()
{
    // execute task
}
mattab commented 14 years ago

proposal looks good to me!

julienmoumne commented 14 years ago

I have submitted a patch in which I decided to remove all modulo calculus in favor of easier to read and easier to maintain computations.

mattab commented 14 years ago

(In [2648]) Fixes #1184 Great patch by Julien Moumne to add Scheduled Task API in Piwik

mattab commented 14 years ago

(In [2697]) Refs #5491

mattab commented 14 years ago

(In [2737]) Refs #5491

anonymous-matomo-user commented 14 years ago

is there a possibility to schedule PDF reports without the usage of the crontab? For me it would be nice to run reports e.g. when somebody logs in because I do not have the possibility to create crontabs.

mattab commented 14 years ago

Beatgarantie, scheduled reports should work without crontab in 0.7. Requests to the Tracker will trigger scheduled tasks hourly. See #587 - let me know if it works for you

anonymous-matomo-user commented 14 years ago

@matt: OK, I will test.

It would the nice to see the PDF-template after switching to another tracked page via the website-dropdown.