As of right now a limitation of the toolbox is the lack of flexibility and customisation to other peoples resources. Different clusters (SGE or Slurm) may have different ways how to query/assign resources. For example the number of GPUs:
In order for this to be open sourced, we need to make this way more general. The user should simply set the syntax at initialisation. We could store this then it the mle_config.toml file. Afterwards the toolbox knows how to do it and the user can later on change things if the resources change. This can/should be integrated into mle-init and issue #24.
As of right now a limitation of the toolbox is the lack of flexibility and customisation to other peoples resources. Different clusters (SGE or Slurm) may have different ways how to query/assign resources. For example the number of GPUs:
#$ -l cuda="{num_gpus}(RTX2080)"
#SBATCH --gres=gpu:tesla:{num_gpus}
https://github.com/RobertTLange/mle-toolbox/blob/b659278184d68a21f9f212e8e93a9193719b3ef0/mle_toolbox/experiment/sge_job_management.py#L72
https://github.com/RobertTLange/mle-toolbox/blob/b659278184d68a21f9f212e8e93a9193719b3ef0/mle_toolbox/experiment/slurm_job_management.py#L69
In order for this to be open sourced, we need to make this way more general. The user should simply set the syntax at initialisation. We could store this then it the
mle_config.toml
file. Afterwards the toolbox knows how to do it and the user can later on change things if the resources change. This can/should be integrated intomle-init
and issue #24.Potentially also have a look at how ray and torch lightning do things.