Opening spectral cubes is slow when running threads in parallel

vlas-sokolov / pyspecnest

Nested sampling of pyspeckit models with PyMultiNest.

MIT License

2 stars 3 forks source link

Opening spectral cubes is slow when running threads in parallel #6

Open vlas-sokolov opened 6 years ago

vlas-sokolov commented 6 years ago

Scripts running on large fits files get sluggish execution speeds.

Here is the time it takes to open NGC1333 GAS DR1 data when no other process is running. Only one CPU is used on the process monitor.

In [7]: %timeit pyspeckit.Cube(fitsfile)
1 loop, best of 3: 480 ms per loop

And this is what happens when 30 processes (all sampling spectra) are working in the background. In principle, with 10 available cores you wouldn't expect the spectral cube initialization to be much slower, but the %timeit magic shows the contrary:

In [9]: %timeit pyspeckit.Cube(fitsfile)
1 loop, best of 3: 10.2 s per loop

Could it be that there's a traffic jam when reading the file?

vlas-sokolov commented 6 years ago

Definitely not a file lock issue - I get up to 15 times slower load speeds for other fits files on the same machine.

vlas-sokolov commented 6 years ago

The issue seems to be in pyspeckit. SpectralCube doesn't slow down as much:

In [41]: fitsfile
Out[41]: 'NGC1333_NH3_11_base_DR1.fits'

In [42]: %timeit SpectralCube.read(fitsfile)
100 loops, best of 3: 8.6 ms per loop

In [43]: %timeit pyspeckit.Cube(fitsfile)
1 loop, best of 3: 22.1 s per loop

In [44]: # now run without 30/40 CPUs running jobs

In [45]: %timeit SpectralCube.read(fitsfile)
100 loops, best of 3: 6.66 ms per loop

In [46]: %timeit pyspeckit.Cube(fitsfile)
1 loop, best of 3: 484 ms per loop

In [47]: pyspeckit.__version__
Out[47]: '0.1.21.dev2691'

vlas-sokolov commented 6 years ago

Tried running artificial CPU/io load with both stress and dd (SO link here), can not reproduce. Looks like pyspeckit jobs have to be running in parallel.

vlas-sokolov commented 6 years ago

A relevant (? this isn't exactly how I start processes) example:

import pyspeckit
from multiprocessing import Pool

def read(fname='NGC1333_NH3_11_base_DR1.fits'):
    spc = pyspeckit.Cube(fname)
    # don't return

def multiread(n=6):
    fname = 'NGC1333_NH3_11_base_DR1.fits'
    p = Pool(n)
    results = p.map(read, [fname]*n)

In [14]: %timeit read()
1 loop, best of 3: 683 ms per loop

In [15]: %timeit multiread(n=30)
1 loop, best of 3: 16.9 s per loop

vlas-sokolov commented 6 years ago

Adam is telling me that spectral_cube does lazy i/o on cubes, so the time overheads are coming into play from loading cubes into memory.

Solution? Switch to passing single spectra. Opening cubes in every child process was silly anyway.

vlas-sokolov commented 6 years ago

[x] Figure out a way to store spectra, with errors and everything, on the hard drive.
[x] Address a special case: CubeStack instances, which have to either be stored one-by-one or together.
[x] Rewrite innocent_scripts to operate on spectra only.

vlas-sokolov commented 6 years ago

Writing CubeStack-like spectra is possible as sp.write('foo.fits', tolerance=np.inf).

EDIT: this distorts the xaxis, which is forced to be uniform. The spectral data / errors are saved well enough though, the only issue is preserving the correct SpectroscopicAxis instance. This one is uniform for an entire cube, I should consider just saving an array along with essential xarrkwargs.

vlas-sokolov commented 6 years ago

Now we will store the xarr, spc.cube, spc.errorcube, and spc.header on a hard drive as a one-time function:

In [618]: %timeit save_datacube(pspc, target_dir, target_cubefile, target_errfile, target_header)
1 loop, best of 3: 2.33 s per loop

Followed by a fast-ish read function that operates on the products above:

In [619]: %timeit spsp = get_spectrum(140, 140, target_dir, target_xarr, target_xarrkwargs, target_cubefi
     ...: le, target_errfile)
10 loops, best of 3: 25.5 ms per loop

This definitely compares rather well to a previously used method with a time benchmark of

In [622]: %timeit spc = make_cube()
1 loop, best of 3: 9 s per loop

vlas-sokolov commented 6 years ago

The speedup feels great. What now?

[ ] implement the new functions into pyspecnest (w. OOP)