radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Executable MPI tasks corrupted profiles #3104

Open AymenFJA opened 9 months ago

AymenFJA commented 9 months ago
1702383126.7820190,exec_start,,MainThread,task.000089,AGENT_EXECUTING,
02383126.7822950,exec_pre,,MainThread,task.000089,AGENT_EXECUTING, <<<------------
1702383126.7824100,exec_pre,,MainThread,task.000089,AGENT_EXECUTING,
1702383126.7827590,exec_pre,,MainThread,task.000089,AGENT_EXECUTING,
,                                                                              <<<------------
1702383126.7826840,exec_pre,,MainThread,task.000089,AGENT_EXECUTING,
,                                                                              <<<------------
1702383126.7824930,exec_start,,MainThread,task.000089,AGENT_EXECUTING,
andre-merzky commented 9 months ago

What was the difference between the two sample executions above? Also, are there lines in the latter profile without final comma, or are those single-comma-lines additional to the correct profile?

I am quite surprised by this. The profile lines are written by something like printf "%,6f, x,y,z,\\n" $now >> task.prof. Shell I/O redirection should be atomic by default. I will read up on this a bit. On what system did that happen? Were the ranks located on the same node? What is the underlying shared file system type?

Thanks @AymenFJA !

AymenFJA commented 9 months ago

Hey @andre-merzky, to answer your questions:

andre-merzky commented 9 months ago

Thanks a lot @AymenFJA , that helps. Can you please attach an task.x.exec.sh, please?

AymenFJA commented 9 months ago

Thanks a lot @AymenFJA , that helps. Can you please attach an task.x.exec.sh, please?

@andre-merzky Here it is:

#!/bin/sh

# ------------------------------------------------------------------------------

export RP_TASK_ID="task.000089"
export RP_TASK_NAME="task.000089"
export RP_PILOT_ID="pilot.0000"
export RP_SESSION_ID="rp.session.udc-aw32-7c0.vaf8uz.019704.0000"
export RP_RESOURCE="uva.rivanna"
export RP_RESOURCE_SANDBOX="/scratch/vaf8uz/radical.pilot.sandbox"
export RP_SESSION_SANDBOX="$RP_RESOURCE_SANDBOX/$RP_SESSION_ID/"
export RP_PILOT_SANDBOX="$RP_SESSION_SANDBOX/pilot.0000/"
export RP_TASK_SANDBOX="$RP_PILOT_SANDBOX/task.000089"
export RP_REGISTRY_ADDRESS="tcp://10.153.50.62:10002"
export RP_CORES_PER_RANK=1
export RP_GPUS_PER_RANK=0
export RP_GTOD="$RP_PILOT_SANDBOX/gtod"
export RP_PROF="$RP_PILOT_SANDBOX/prof"
export RP_PROF_TGT="$RP_PILOT_SANDBOX/task.000089/task.000089.prof"

rp_error() {
    echo "$1 failed" 1>&2
    exit 1
}

# ------------------------------------------------------------------------------
# rank ID
export RP_RANKS=60
test -z "$SLURM_PROCID" || export RP_RANK=$SLURM_PROCID
test -z "$MPI_RANK"     || export RP_RANK=$MPI_RANK
test -z "$PMIX_RANK"    || export RP_RANK=$PMIX_RANK

rp_sync_ranks() {
    sig=$1
    echo $RP_RANK >> $sig.sig
    while test $(cat $sig.sig | wc -l) -lt $RP_RANKS; do
        sleep 1
    done
}

# ------------------------------------------------------------------------------
$RP_PROF exec_start

# ------------------------------------------------------------------------------
# pre-exec commands
$RP_PROF exec_pre
export OMPI_MCA_memory_ptmalloc2_disable=1 || rp_error pre_exec
source /home/vaf8uz/scratch/Cylon/cylon/cy-rp-env/bin/activate || rp_error pre_exec
export LD_LIBRARY_PATH=/home/vaf8uz/scratch/Cylon/cylon/build/arrow/install/lib64:/home/vaf8uz/scratch/Cylon/cylon/build/glog/install/lib64:/home/vaf8uz/scratch/Cylon/cylon/build/lib64:/home/vaf8uz/scratch/Cylon/cylon/build/lib:$LD_LIBRARY_PATH || rp_error pre_exec

# ------------------------------------------------------------------------------
# execute rank
$RP_PROF rank_start
python "cylon_scaling.py" "-n" "100000000" "-i" "4" "-s" "s"
RP_RET=$?
$RP_PROF rank_stop

# ------------------------------------------------------------------------------
# post-exec commands
$RP_PROF exec_post

# ------------------------------------------------------------------------------
$RP_PROF exec_stop
exit $RP_RET

# ------------------------------------------------------------------------------
andre-merzky commented 9 months ago

What is the output of mount | grep scratch on a compute node, please?

The script looks as expected. My assumption would be that the shared FS is not doing atomic writes for multi-node runs. the above command should tell us the file system type so we can have a look at the documentation.

AymenFJA commented 8 months ago

@andre-merzky this is what I got when running mount " grep scratch:

bash-4.4$mount | grep scratch
/dev/sda on /localscratch type ext4 (rw,relatime)
bash-4.4$
andre-merzky commented 8 months ago

Woah, we get data reshuffled on an ext4??? What the heck?? Let me read up a bit more, that I did not expect at all...