Closed mpoquet closed 7 years ago
It seems that Batsim deadlocks under some conditions when jobs are killed.
(all files are not available on the repo)
# If needed, the output directory of this script can be specified within this file base_output_directory: /tmp/batsim_tests/issue37 base_variables: batsim_dir: ${base_working_directory} implicit_instances: implicit: sweep: platform : - {"name":"cluster", "filename":"${batsim_dir}/platforms/cluster_issue36.xml", "master_host":"master_host0"} workload : - {"name":"tiny", "filename": "${batsim_dir}/workload_profiles/one_delay_job.json"} algo: - {"name":"killer", "sched_name":"killer"} generic_instance: timeout: 60 working_directory: ${base_working_directory} output_directory: ${base_output_directory}/results/${algo[name]}_${workload[name]}_${platform[name]} batsim_command: batsim -p ${platform[filename]} -w ${workload[filename]} -e ${output_directory}/out --config ${output_directory}/batsim.conf -m ${platform[master_host]} sched_command: batsched -v ${algo[sched_name]} --variant_options_filepath ${output_directory}/sched_input.json commands_before_execution: # Generate Batsim config file - | #!/usr/bin/env bash cat > ${output_directory}/batsim.conf << EOF { "job_submission": { "forward_profiles": true, "from_scheduler":{ "enabled": true, "acknowledge": true } } } EOF # Generate sched input - | #!/usr/bin/env bash cat > ${output_directory}/sched_input.json << EOF { "nb_kills_per_job": 1, "delay_before_kill": 10 } EOF commands_before_instances: - ${batsim_dir}/test/is_batsim_dir.py ${base_working_directory} - ${batsim_dir}/test/clean_output_dir.py ${base_output_directory}
Does not seem to depend on the platform nor the workload.
Ahem... Dynamic submissions should not be allowed with this scheduler. This is the reason why the deadlock occurs.
It seems that Batsim deadlocks under some conditions when jobs are killed.
Versions
Yaml to reproduce:
(all files are not available on the repo)