Closed lukeconibear closed 3 years ago
Not sure how helpful this is, but I wrote a python script ages ago that parallelises postprocessing. I can't remember whether I shared it with anyone at the time. It does it in a kind of crude way by splitting the for loop in post.bash into several for loops, which are all submitted to the queue. The number of times that the for loop is split can be set in the script. It's probably not the best solution but did save me a lot of time:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jan 19 10:41:36 2019
@author: eebjs
"""
import re
from random import randint
import subprocess
# choose how many times to split post.bash loop
splits = 8
# open post.bash
nl = open('post.bash', 'r')
lines = nl.read().split('\n')
# extract line of outer for loop
fl = [l for l in lines if l.startswith('for hour in')][0]
# extract index of line of outer for loop in 'lines'
fl_i = lines.index(fl)
# get iterations in loop
its = [int(w) for w in re.findall(r"[\w']+", fl) if w.isdigit()][1]
# create list of loop iterators
divisor = round(its / splits)
its_list = [divisor] * (splits)
# add the remainder to the list
while sum(its_list) < its:
idx = randint(0, splits-1)
its_list[idx] += 1
split_fl_list = []
count = 0
for it in its_list:
nit1 = count+it
if count == 0:
nit0 = count
else:
nit0 = count+1
split_fl_list.append('for hour in $(seq -w '\
+str(int(nit0))+' '+str(int(nit1))+')')
count += it
# write split post.bash files
for i, split_fl in enumerate(split_fl_list):
# deep copy original lines
new_lines = lines[:]
# replace old fl with split
new_lines[fl_i] = split_fl
with open('post_split_'+str(i)+'.bash', 'w') as file:
file.write('\n'.join(new_lines))
# qsub the post.bash_split* files
for i in range(len(split_fl_list)):
subprocess.call(['qsub', 'post_split_'+str(i)+'.bash'])
I haven't tested this with the latest WRFotron.
I think your solution here (duplicate post.bash
) has a similar outcome to Helen's solution (parallelise post.bash
over multiple processes) for the old NCL postprocessing script (pp.ncl
). Though now when using its replacement (postprocessing.py
), Helen's parallelisation solution (using GOMP_CPU_AFFINITY
) gets overridden by KMP_AFFINITY
(more info). This is the cause of the first warning message (OMP: Warning #181: GOMP_CPU_AFFINITY: ignored because KMP_AFFINITY has been defined
). Then the next 3 cores have invalid process assignment (OMP: Warning #123: Ignoring invalid OS proc ID XX
).
It would be nice to keep with Helen's parallelisation approach if we can.
I think one solution may be (something like) specifying at the top of post.bash
the OpenMP runtime environment and the process assignment (more info). Though I haven't made this test successfully yet. Maybe @cemachelen knows more here or the answer is in these docs somewhere.
export OMP_NUM_THREADS=__nprocPost__
export KMP_AFFINITY=granularity=fine,compact
The idea I had above about parallelising wrfpython independently isn't what we need as we want to parallelise a bash function (which calls a python script along with multiple NCO commands).
If these parallelisation issues can't be overcome easily, then we could go to your duplication method as a workaround.
I think the KMP_AFFINITY is set by loading any MPI so adding to post.bash
module purge
module load licenses sge intel nco wrfchemconda
export OMP_NUM_THREADS=__nprocPost__
I've been running some other post processing scripts explicitly purging modules and the KMP_AFFINITY errors are no longer there.
Great, thanks very much Helen. That works.
I've committed that in, with the module versions specified for now to avoid any clashes.
Issue solved.
What happened:
postprocessing.py
does not run in parallel due toKMP_AFFINITY
disabling multi-threading.Minimal Complete Verifiable Example:
post.bash
returns:Potential solution?: Utilising OpenMP within wrf-python.
Within
postprocessing.py
, this may be something similar to:And within
post.bash
, add a 4th positional argument:However, paralleisation within
post.bash
is currently set up to loop over the wholepostpoc
function, so this solution will need rethinking.