msimonin / vagrant-g5k

Hacking around vagrant and g5k
MIT License
3 stars 1 forks source link

Multiple nodes may be reserved for a single vm #11

Closed msimonin closed 7 years ago

msimonin commented 7 years ago

See this :

Job_Id: 1279647
    job_array_id = 1279647
    job_array_index = 1
    name = test-5
    project = default
    owner = abrandon
    state = Terminated
    wanted_resources = -l "{(virtual != 'none') AND type = 'default'}/core=3,walltime=3:0:0"
    types =
    dependencies =
    assigned_resources = 2645+2659+2660
    assigned_hostnames = graphene-125.nancy.grid5000.fr+graphene-128.nancy.grid5000.fr
    queue = default
    command = .vagrant/test-vagrant-g5k/launch_vm.sh 3 10240 BRIDGE .vagrant/test-vagrant-g5k/subnet -drive file=/home/abrandon/public/centos_7.2_dcos.qcow2,if=virtio -snapshot
    exit_code = 0 (0,0,0)
    launchingDirectory = /home/abrandon
    stdout_file = OAR.test-5.1279647.stdout
    stderr_file = OAR.test-5.1279647.stderr
    jobType = PASSIVE
    properties = (maintenance = 'NO') AND production = 'NO'
    reservation = None
    walltime = 3:0:0
    submissionTime = 2017-06-15 11:26:01
    startTime = 2017-06-15 11:26:02
    stopTime = 2017-06-15 13:00:18
    cpuset_name = abrandon_1279647
    initial_request = oarsub --json -l {virtual != 'none'}/core=3, walltime=03:00:00 --name test-5 --checkpoint 60 --signal 12 .vagrant/test-vagrant-g5k/launch_vm.sh 3 10240 BRIDGE .vagrant/test-vagrant-g5k/subnet -drive file=/home/abrandon/public/centos_7.2_dcos.qcow2,if=virtio -snapshot
    message = FIFO scheduling OK
    scheduledStart = no prediction
    resubmit_job_id = 0
    events =
2017-06-15 13:00:20> SWITCH_INTO_TERMINATE_STATE:[bipbip 1279647] Ask to change the job state

cc @Brandonage

Brandonage commented 7 years ago

Thanks for the update. Everything runs more smoothly now and I don't get that many machines going down. All the jobs are run in one machine now instead of multiple machines. However sometimes I still get the same error, even if the job is running on a single machine. I attach here one example

abrandon@fnancy:~$ oarstat -j 1282350 -f Job_Id: 1282350 job_array_id = 1282350 job_array_index = 1 name = test-2 project = default owner = abrandon state = Terminated wanted_resources = -l "{(virtual != 'none') AND type = 'default'}/host=1/core=2,walltime=12:0:0" types = dependencies = assigned_resources = 2647+2648 assigned_hostnames = graphene-125.nancy.grid5000.fr queue = default command = .vagrant/test-vagrant-g5k/launch_vm.sh 2 12288 BRIDGE .vagrant/test-vagrant-g5k/subnet -drive file=/home/abrandon/public/centos_7.2_dcos.qcow2,if=virtio -snapshot exit_code = 0 (0,0,0) launchingDirectory = /home/abrandon stdout_file = OAR.test-2.1282350.stdout stderr_file = OAR.test-2.1282350.stderr jobType = PASSIVE properties = (maintenance = 'NO') AND production = 'NO' reservation = None walltime = 12:0:0 submissionTime = 2017-06-20 13:37:23 startTime = 2017-06-20 13:37:28 stopTime = 2017-06-20 18:28:43 cpuset_name = abrandon_1282350 initial_request = oarsub --json -l {virtual != 'none'}/nodes=1/core=2, walltime=12:00:00 --name test-2 --checkpoint 60 --signal 12 .vagrant/test-vagrant-g5k/launch_vm.sh 2 12288 BRIDGE .vagrant/test-vagrant-g5k/subnet -drive file=/home/abrandon/public/centos_7.2_dcos.qcow2,if=virtio -snapshot message = FIFO scheduling OK scheduledStart = no prediction resubmit_job_id = 0 events = 2017-06-20 18:28:45> SWITCH_INTO_TERMINATE_STATE:[bipbip 1282350] Ask to change the job state

2017-06-18 23:52 GMT+02:00 Matthieu Simonin notifications@github.com:

Closed #11 https://github.com/msimonin/vagrant-g5k/issues/11 via c90d61e https://github.com/msimonin/vagrant-g5k/commit/c90d61ed7cdcf90dd5c903736c5153168e35a1f0 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/msimonin/vagrant-g5k/issues/11#event-1128258390, or mute the thread https://github.com/notifications/unsubscribe-auth/ALd8pf1dywoK8ZV0cjah-TsYfuH9HCpJks5sFZwcgaJpZM4N9LKR .