multiscale-cosim / EBRAINS-cosim

EBRAINS-cosim
Other
5 stars 0 forks source link

1.1 Testing Co-Simulator TVB-NEST/launcher on LOCAL (VM) infrastructure #74

Closed ringleschavez closed 2 years ago

ringleschavez commented 3 years ago
Aspect Detail
Summary Deploy the current version of the Co-Simulator
Task Area
Assignee
Information
Prerequisites
Dependencies

Summary

Tasks

Requirements

Acceptance criteria

NOTE

The issue to accomplish the test_co_sim.sh use-case mentioned above on HPC systems, speficically on FZJ/JSC infrastructure, the #93 is the issue create for following-up that regard.

ringleschavez commented 3 years ago

C.E.C.I.'s Slurm F.A.Q. taken from HPC cluster: select the number of CPUs and threads in SLURM sbatch

ringleschavez commented 3 years ago

Having executed the ${MULTISCALETVBNEST}/launcher/tests/plans/simple_plan_on_cluster.xml on JUWELS, it has been noticed that some ERROR messages were thrown. NEVERTHELESS, slurm tool commands send some informative message to the stderr, i.e. the Co-Simulator reports such message as ERROR because they were gotten from the stderr buffer.

2021-06-21 14:20:54,834 - INFO - common.cosimulator - [Spawner-2:20460] - action_004: PPID=24823,PID=24829,MPI.COMM_WORLD.size=2,MPI.COMM_WORLD.rank=0,MPI.processor_name=jwc00n014.juwels

2021-06-21 14:20:54,834 - ERROR - common.cosimulator - [Spawner-2:20460] - action_004: srun: job 3884042 queued and waiting for resources

2021-06-21 14:20:54,834 - INFO - common.cosimulator - [Spawner-2:20460] - action_004: PPID=24824,PID=24830,MPI.COMM_WORLD.size=2,MPI.COMM_WORLD.rank=1,MPI.processor_name=jwc00n014.juwels

2021-06-21 14:20:54,834 - ERROR - common.cosimulator - [Spawner-2:20460] - action_004: srun: job 3884042 has been allocated resources

2021-06-21 14:20:55,037 - INFO - common.cosimulator - [Spawner-1:20459] - action_006: PPID=16355,PID=16360,MPI.COMM_WORLD.size=2,MPI.COMM_WORLD.rank=0,MPI.processor_name=jwc00n004.juwels

2021-06-21 14:20:55,037 - ERROR - common.cosimulator - [Spawner-1:20459] - action_006: srun: job 3884043 queued and waiting for resources

2021-06-21 14:20:55,037 - INFO - common.cosimulator - [Spawner-1:20459] - action_006: PPID=16356,PID=16362,MPI.COMM_WORLD.size=2,MPI.COMM_WORLD.rank=1,MPI.processor_name=jwc00n004.juwels

2021-06-21 14:20:55,037 - ERROR - common.cosimulator - [Spawner-1:20459] - action_006: srun: job 3884043 has been allocated resources

2021-06-21 14:20:55,039 - INFO - common.cosimulator - [Spawner-2:20460] - action_004: PPID=24824, PID=24830, Cosimulation_outputs/ingleschavez1_outputs_2021-06-21_142021/results/simple_test/24830.output has been generated
2021-06-21 14:20:55,039 - INFO - common.cosimulator - [Spawner-2:20460] - action_004: PPID=24823, PID=24829, Cosimulation_outputs/ingleschavez1_outputs_2021-06-21_142021/results/simple_test/24829.output has been generated
2021-06-21 14:20:55,039 - INFO - common.cosimulator - [Spawner-2:20460] - Action <action_004> finished properly.
2021-06-21 14:20:55,039 - INFO - common.cosimulator - [Spawner-2:20460] - PPID=20458,PID=20460,Spawner-2: the <action_004> action has finished
ringleschavez commented 2 years ago

The issue #112 has been created based on the already dropped task: