Open sylvlecl opened 4 years ago
If slurm is in local mode, we can simply register a WatchService
on flagDir
.
Yes, but the problem is that even in "local" mode, there are good chances that the flag dir is actually on a shared filesystem, for instance a nfs mount, so that slurm nodes can access it. In that case, the watch service will probably not work (or be implemented with polling).
Feature
In order to monitor the completion of jobs submitted to Slurm, we use files and filesystem polling. Depending on the polling frequency, this introduces some performance cost (delay between the end of the task and the time when the computation manager identifies it as completed), and some load on the underlying filesystem, in particular when multiple processes using a computation manager are running.
We could be able to configure the way the completion monitoring is performed. Polling will be one implementation of this functionality.
Other interesting implementations would be :
Improving perceived performances while relieving the filesystem.