Running pysplit in parallel

nicrie commented 1 year ago

Dear Mellissa, First off, great package that really facilitates my analysis ! Thank you :) I'm currently doing an analysis for which I need to estimate the back trajectories within a given region (in contrast to a fixed point). That is, I want to run HySPLIT / pySPLIT for many different locations within that region. As this requires quite some computational time, are you aware of any smart trick to run pySPLIT /HySPLIT in parallel?

I tried to spawn multiple processes in Python using the multiprocessing module, but there seems to be an issue with HySPLIT (not so much pySPLIT I guess) that prevents multiprocessing from running several processes in parallel. Although the multi-process seems to start, I only observe one of my CPUs running at a time + after some seconds the script stops asking for user input (e.g. start location) which looks like something coming directly from HySPLIT. I admit I have no idea why these messages are coming, but it may be that HySPLIT just cannot be run in parallel.

Minimal (almost) working example

The directories of the hysplit installation/meteorological files etc. have to be modified in order to run.

# %%
import os

import numpy as np
import matplotlib.pyplot as plt
import pysplit

from multiprocessing import Pool

from tqdm import tqdm

# Define paths // 
hysplit_dir = r'/home/nrieger/hysplit.v5.2.3/exec/hyts_std'
working_dir = r'/home/nrieger/hysplit.v5.2.3/working'
meteo_dir = r'/media/nrieger/extern/data/hysplit/met'
storage_dir = r'/home/nrieger/Projects/paleo/ArabianSea/paper_3mon/data/hysplit/makran/'

os.chdir(working_dir)

years = [2009] 
months = [3]
hours = [15]
altitudes = [4500]
runtime = -120

# Define 9 different lon/lat pairs which should be processed in parallel
lons = np.arange(60, 71, 5)
lats = np.arange(26, 31, 2)
xx, yy = np.meshgrid(lons, lats)
locations = list(zip(xx.flatten(),yy.flatten()))

def gen_traj_at_loc(lonlat):
    lon, lat = lonlat
    basename = 'W{:d}N{:d}'.format(lon, lat)
    pysplit.generate_bulktraj(
        basename, working_dir, storage_dir, meteo_dir,
        years, months, hours, altitudes, lonlat, runtime,
        meteoyr_2digits=False, outputyr_2digits=False,
        monthslice=slice(1, 32, 2), get_reverse=False,
        get_clipped=False, hysplit=hysplit_dir
    )

processes = 4

with Pool(processes) as pool:
    processed = pool.map(gen_traj_at_loc, locations)

rmoore67 commented 12 months ago

Hi there, just wondering if you had any luck running Hysplit/PySplit in parallel?

nicrie commented 12 months ago

Hi Ruth, unfortunately not, in the end I did good old sequential processing

mscross / pysplit

Running pysplit in parallel #95

Minimal (almost) working example