Dear Mellissa,
First off, great package that really facilitates my analysis ! Thank you :)
I'm currently doing an analysis for which I need to estimate the back trajectories within a given region (in contrast to a fixed point). That is, I want to run HySPLIT / pySPLIT for many different locations within that region. As this requires quite some computational time, are you aware of any smart trick to run pySPLIT /HySPLIT in parallel?
I tried to spawn multiple processes in Python using the multiprocessing module, but there seems to be an issue with HySPLIT (not so much pySPLIT I guess) that prevents multiprocessing from running several processes in parallel. Although the multi-process seems to start, I only observe one of my CPUs running at a time + after some seconds the script stops asking for user input (e.g. start location) which looks like something coming directly from HySPLIT. I admit I have no idea why these messages are coming, but it may be that HySPLIT just cannot be run in parallel.
Minimal (almost) working example
The directories of the hysplit installation/meteorological files etc. have to be modified in order to run.
# %%
import os
import numpy as np
import matplotlib.pyplot as plt
import pysplit
from multiprocessing import Pool
from tqdm import tqdm
# Define paths //
hysplit_dir = r'/home/nrieger/hysplit.v5.2.3/exec/hyts_std'
working_dir = r'/home/nrieger/hysplit.v5.2.3/working'
meteo_dir = r'/media/nrieger/extern/data/hysplit/met'
storage_dir = r'/home/nrieger/Projects/paleo/ArabianSea/paper_3mon/data/hysplit/makran/'
os.chdir(working_dir)
years = [2009]
months = [3]
hours = [15]
altitudes = [4500]
runtime = -120
# Define 9 different lon/lat pairs which should be processed in parallel
lons = np.arange(60, 71, 5)
lats = np.arange(26, 31, 2)
xx, yy = np.meshgrid(lons, lats)
locations = list(zip(xx.flatten(),yy.flatten()))
def gen_traj_at_loc(lonlat):
lon, lat = lonlat
basename = 'W{:d}N{:d}'.format(lon, lat)
pysplit.generate_bulktraj(
basename, working_dir, storage_dir, meteo_dir,
years, months, hours, altitudes, lonlat, runtime,
meteoyr_2digits=False, outputyr_2digits=False,
monthslice=slice(1, 32, 2), get_reverse=False,
get_clipped=False, hysplit=hysplit_dir
)
processes = 4
with Pool(processes) as pool:
processed = pool.map(gen_traj_at_loc, locations)
Dear Mellissa, First off, great package that really facilitates my analysis ! Thank you :) I'm currently doing an analysis for which I need to estimate the back trajectories within a given region (in contrast to a fixed point). That is, I want to run HySPLIT / pySPLIT for many different locations within that region. As this requires quite some computational time, are you aware of any smart trick to run pySPLIT /HySPLIT in parallel?
I tried to spawn multiple processes in Python using the
multiprocessing
module, but there seems to be an issue with HySPLIT (not so much pySPLIT I guess) that prevents multiprocessing from running several processes in parallel. Although the multi-process seems to start, I only observe one of my CPUs running at a time + after some seconds the script stops asking for user input (e.g. start location) which looks like something coming directly from HySPLIT. I admit I have no idea why these messages are coming, but it may be that HySPLIT just cannot be run in parallel.Minimal (almost) working example
The directories of the hysplit installation/meteorological files etc. have to be modified in order to run.