Open jakirkham opened 5 years ago
Hello @jakirkham,
The only part tied to slurm specifically is red-box and virtual-kubelet provider (a bit). Core logic is in red-box, it implements WorkloadManager interface, and the rest elements use that interface to communicate. So if anyone wants to extend this, new WorkloadManager implementation (new red-box) is the way to go :)
Potentially, operator can work with any WLM. The only thing you need to do, is to implement GRPc server corresponding to our workload.proto spec. And use your implementation instead red-box(which is actually just workload.proto implementation for SLURM)
@jakirkham Thanks for stopping by! I just want to say that we'd be more than happy to work with the community to accept contributions which are enabling other WLMs into this architecture.
great discussion. just wondering if a generic implementation on an open standard like DRMAA would be useful for that -> https://github.com/dgruber/drmaa
Actually we have taken a look at DRMAA. The second version(drmaa2) looks perfect for us, but it seems not widely used. About DRMAA v1 it seems to miss some important for us features. For example, it's very important for us to have a possibility to get an information about WLM partitions(queues) and resources they have. At this moment I'm not sure if it's possible with the first version.
Yeah, agreed. Adoption could be better. I started a generic implementation of DRMAA2 in Go (https://github.com/dgruber/drmaa2os). An initial cli wrapper for slurm exists (https://github.com/dgruber/drmaa2os/tree/master/pkg/jobtracker/slurmcli). Could serve as a starting point...deserves certainly more attention. Contributions welcome!
As there are many different job schedulers used on HPC, it would be interesting to know to what extent the work here could be generalized to apply to other job schedulers to cover more use cases. For instance what would it take to get this to work on SGE or LSF or some other arbitrary job scheduler? Would it be possible to parameterize things a bit? To what extent is this tied to SLURM specifically? Thanks in advance for your thoughts. 🙂