wrfchem-leeds / WRFotron

Tools to automatise WRF-Chem runs with re-initialised meteorology
https://wrfchem-leeds.github.io/WRFotron/
GNU Affero General Public License v3.0
21 stars 7 forks source link

Parallelise postprocessing.py #13

Closed lukeconibear closed 3 years ago

lukeconibear commented 4 years ago

What happened: postprocessing.py does not run in parallel due to KMP_AFFINITY disabling multi-threading.

Minimal Complete Verifiable Example: post.bash returns:

OMP: Warning #181: GOMP_CPU_AFFINITY: ignored because KMP_AFFINITY has been defined
OMP: Warning #123: Ignoring invalid OS proc ID 10.
OMP: Warning #123: Ignoring invalid OS proc ID 12.
OMP: Warning #123: Ignoring invalid OS proc ID 14.

Potential solution?: Utilising OpenMP within wrf-python.

Within postprocessing.py, this may be something similar to:

from wrf import omp_set_num_threads, omp_get_max_threads
from wrf import omp_set_schedule, omp_get_schedule, OMP_SCHED_GUIDED

omp_set_num_threads(int(sys.argv[4]))
omp_set_schedule(OMP_SCHED_GUIDED, 0)
sched, modifier = omp_get_schedule()

And within post.bash, add a 4th positional argument:

python ${pyPpScript} ${inFile} tmp_${outFile} ${WRFdir} ${nprocPost}

However, paralleisation within post.bash is currently set up to loop over the whole postpoc function, so this solution will need rethinking.

bjsilver commented 3 years ago

Not sure how helpful this is, but I wrote a python script ages ago that parallelises postprocessing. I can't remember whether I shared it with anyone at the time. It does it in a kind of crude way by splitting the for loop in post.bash into several for loops, which are all submitted to the queue. The number of times that the for loop is split can be set in the script. It's probably not the best solution but did save me a lot of time:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jan 19 10:41:36 2019

@author: eebjs
"""

import re
from random import randint
import subprocess

# choose how many times to split post.bash loop
splits = 8

# open post.bash
nl  = open('post.bash', 'r')
lines = nl.read().split('\n')

# extract line of outer for loop
fl = [l for l in lines if l.startswith('for hour in')][0]

# extract index of line of outer for loop in 'lines'
fl_i = lines.index(fl)

# get iterations in loop
its = [int(w) for w in re.findall(r"[\w']+", fl) if w.isdigit()][1]

# create list of loop iterators
divisor = round(its / splits)
its_list = [divisor] * (splits)

# add the remainder to the list
while sum(its_list) < its:
    idx = randint(0, splits-1)
    its_list[idx] += 1

split_fl_list = []
count = 0
for it in its_list:
    nit1 = count+it
    if count == 0:
        nit0 = count
    else:
        nit0 = count+1

    split_fl_list.append('for hour in $(seq -w '\
                         +str(int(nit0))+' '+str(int(nit1))+')')
    count += it

# write split post.bash files
for i, split_fl in enumerate(split_fl_list):
    # deep copy original lines
    new_lines = lines[:]
    # replace old fl with split
    new_lines[fl_i] = split_fl

    with open('post_split_'+str(i)+'.bash', 'w') as file:
        file.write('\n'.join(new_lines))

# qsub the post.bash_split* files
for i in range(len(split_fl_list)):
    subprocess.call(['qsub', 'post_split_'+str(i)+'.bash'])

I haven't tested this with the latest WRFotron.

lukeconibear commented 3 years ago

I think your solution here (duplicate post.bash) has a similar outcome to Helen's solution (parallelise post.bash over multiple processes) for the old NCL postprocessing script (pp.ncl). Though now when using its replacement (postprocessing.py), Helen's parallelisation solution (using GOMP_CPU_AFFINITY) gets overridden by KMP_AFFINITY (more info). This is the cause of the first warning message (OMP: Warning #181: GOMP_CPU_AFFINITY: ignored because KMP_AFFINITY has been defined). Then the next 3 cores have invalid process assignment (OMP: Warning #123: Ignoring invalid OS proc ID XX).

It would be nice to keep with Helen's parallelisation approach if we can.

I think one solution may be (something like) specifying at the top of post.bash the OpenMP runtime environment and the process assignment (more info). Though I haven't made this test successfully yet. Maybe @cemachelen knows more here or the answer is in these docs somewhere.

export OMP_NUM_THREADS=__nprocPost__
export KMP_AFFINITY=granularity=fine,compact

The idea I had above about parallelising wrfpython independently isn't what we need as we want to parallelise a bash function (which calls a python script along with multiple NCO commands).

If these parallelisation issues can't be overcome easily, then we could go to your duplication method as a workaround.

cemachelen commented 3 years ago

I think the KMP_AFFINITY is set by loading any MPI so adding to post.bash

module purge
module load licenses sge intel nco wrfchemconda
export OMP_NUM_THREADS=__nprocPost__

I've been running some other post processing scripts explicitly purging modules and the KMP_AFFINITY errors are no longer there.

lukeconibear commented 3 years ago

Great, thanks very much Helen. That works.

I've committed that in, with the module versions specified for now to avoid any clashes.

Issue solved.