meetings / gearsloth

Gearman job persistence and delayed execution system
MIT License
2 stars 0 forks source link

Create a file system adapter as the default adapter #67

Closed amv closed 10 years ago

amv commented 10 years ago

The adapter should use the file system and file locking as the means to store delayed tasks.

The file system should be structured as a deep hierarchy to limit too many files being stored in a single directory (which becomes very slow very fast). The structures should also allow runners to easily detect the files that are due to be executed.

My proposition is that the tasks themselves should be stored based on a unique id in a deep double-hex based directory tree in a way that a task with id 0fe5ad75bb54c191aa52 would be stored like this:

/task/0f/e5/ad/75/bb/54/0fe5ad75bb54c191aa52

My proposition to ease the querying in the file system would also be to create an another directory structure using the UTC year, month, day, hour, minute, second as the directories with some levels of double-hex directories beneath them. This file system would act as an index of the tasks that need to be run at given times and would simply contain empty files for each task, so that if the previously referenced task would be due for execution after 23:52:00 of 29.06.2014, an empty file would be stored in the following location:

/due/2014/06/29/23/52/00/0f/e5/ad/0fe5ad75bb54c191aa52

The runner loop would then determine the the current date, list the initial directory for years, ignore all years that are after the current year and spawn a depth first recursion with the same logic into the contents of each year directory. When reaching a directory for the seconds, the whole directory structure would be traversed depth first in a random order to prevent multiple runner lock clashes. Each found "due" file would then be locked one by one and if a lock is retrieved, a lock would be retrieved also for the original task file in the "task" directory. The file in the task directory would be read and the given task would be submitted to a controller. After a successful submission the task file would be altered to contain the failsafe runner retry timestamp, new runner retry count and a new file in the "due" directory would be touched according to the new runner retry timestamp. The old file in the "due" directory would also removed at this stage.

When the task is ultimately ejected, we should remove the "task" directory file and also the "due" directory file. The name of the "due" file can be constructed using the "at" attribute stored in the task file.

As this adapter is mainly supposed to be used for initial installation and testing purposes (although some might consider it to be superior to other systems due to it's radical simplicity) I propose that we do not introduce any additional dependencies to the system because of this and instead use a Child Process with the native "flock" executable to do the file locking for us.

If someone wishes, they can write an alternative node-fs adapter that uses either node-fs-ext or node-unixlib, but for initial installation purposes these modules which require native compilation step should be avoided.

amv commented 10 years ago

Its seems that OSX does not ship with a flock binary... :( The system call does however exist. I am torn between supporting only linux for running initial tests or having to install a proper compile stack to get native bindings to flock.. or to just ditch this whole idea :D

Well.. There is always the option to use a tiny wrapper with perl which has a built in flock (and is installed on OSX by default) to implement flock on systems that don't support it... I find it pretty hard to believe I am seriously contemplating this :D

jarnoharno commented 10 years ago

You could probably use fx.open() with 'wx' flags and a dummy temporary lock file to implement exclusive access natively. See http://nodejs.org/api/fs.html#fs_fs_open_path_flags_mode_callback

amv commented 10 years ago

@jlep That is a good idea. I began writing the adapter last night in pure node.js using base modules, async and underscore as dependencies, with 'wx' mode files as a locking scheme.

amv commented 10 years ago

This is starting to pass a lot of test in the fs-adapter branch so i'll just close it