sylabs / wlm-operator

Singularity implementation of k8s operator for interacting with SLURM.
Apache License 2.0
117 stars 28 forks source link

Generalizing to other job schedulers #161

Open jakirkham opened 5 years ago

jakirkham commented 5 years ago

As there are many different job schedulers used on HPC, it would be interesting to know to what extent the work here could be generalized to apply to other job schedulers to cover more use cases. For instance what would it take to get this to work on SGE or LSF or some other arbitrary job scheduler? Would it be possible to parameterize things a bit? To what extent is this tied to SLURM specifically? Thanks in advance for your thoughts. 🙂

sashayakovtseva commented 5 years ago

Hello @jakirkham,

The only part tied to slurm specifically is red-box and virtual-kubelet provider (a bit). Core logic is in red-box, it implements WorkloadManager interface, and the rest elements use that interface to communicate. So if anyone wants to extend this, new WorkloadManager implementation (new red-box) is the way to go :)

pisarukv commented 5 years ago

Potentially, operator can work with any WLM. The only thing you need to do, is to implement GRPc server corresponding to our workload.proto spec. And use your implementation instead red-box(which is actually just workload.proto implementation for SLURM)

bauerm97 commented 5 years ago

@jakirkham Thanks for stopping by! I just want to say that we'd be more than happy to work with the community to accept contributions which are enabling other WLMs into this architecture.

dgruber commented 5 years ago

great discussion. just wondering if a generic implementation on an open standard like DRMAA would be useful for that -> https://github.com/dgruber/drmaa

pisarukv commented 5 years ago

Actually we have taken a look at DRMAA. The second version(drmaa2) looks perfect for us, but it seems not widely used. About DRMAA v1 it seems to miss some important for us features. For example, it's very important for us to have a possibility to get an information about WLM partitions(queues) and resources they have. At this moment I'm not sure if it's possible with the first version.

dgruber commented 5 years ago

Yeah, agreed. Adoption could be better. I started a generic implementation of DRMAA2 in Go (https://github.com/dgruber/drmaa2os). An initial cli wrapper for slurm exists (https://github.com/dgruber/drmaa2os/tree/master/pkg/jobtracker/slurmcli). Could serve as a starting point...deserves certainly more attention. Contributions welcome!