Reading masked data is slow

AlecThomson commented 4 years ago

Hi there, I've been hitting a problem of massive time differences between reading in masked vs. unmasked vs. subcube data.

For example, if I have a SpectralCube object (cube) and I want to read in a plane from that cube, reading the unmasked data is very fast:

%timeit plane = cube.unmasked_data[i]                                                                                                                                                                                                          
5.93 µs ± 7.11 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But if you want the mask as well:

%timeit plane = cube.filled_data[i]                                                                                                                                                                                                            
1.46 s ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Is there a potentially faster method that I could use?

Similarly, making a subcube is also quite slow:

%timeit plane = cube[i]                                                                
1.47 s ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Would I get a speedup using the cube.subcube method?

Thanks!

keflavich commented 4 years ago

is i an integer index?

cube.filled_data[i] should take longer because it's reading the data into memory and applying a mask, while cube.unmasked_data[i] is returning a view onto memory. You should test the speed of doing an operation, e.g. cube.unmasked_data[i] * 1. If the bottleneck is reading from disk, that should then take as long as the other operations.

The last one, cube[i], is not creating a subcube, it's creating a plane - so it's the same as the second. If you actually want subcubes, you need slices that return cubes, e.g. cube[0:3,:,:], which should be very fast.

AlecThomson commented 4 years ago

Thanks for the tips!

Yes, sorry I actually had i=0 before!

Here is a little more detail and testing.

My data is:

cube.shape                                                                                                                                                                                                                                    
(288, 11800, 11799)

Now, testing with a random int to index the spectral axis:

%timeit i=np.random.randint(low = 0, high = cube.shape[0]); plane = cube.unmasked_data[i]                                                                                                                                                      
23.9 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

And, making a subcube:

%timeit i=np.random.randint(low = 0, high = cube.shape[0]); plane = cube.subcube(zlo=i, zhi=i+1)                                                                                                                                              
5.94 ms ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit i=np.random.randint(low = 0, high = cube.shape[0]); plane = cube[i:i+1,:,:]                                                                                                                                                           
5.92 ms ± 3.01 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Fast, as you suggested.

Now, following your suggestion of *1:

%timeit i=np.random.randint(low = 0, high = cube.shape[0]); plane = cube.unmasked_data[i] * 1                                                                                                                                                 
The slowest run took 5.33 times longer than the fastest. This could mean that an intermediate result is being cached.
1.45 s ± 513 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Slower, but not a whole lot more, given the data is being read into memory.

Today, if I try to get the filled_data:

i=np.random.randint(low = 0, high = cube.shape[0]);plane = cube.filled_data[i]

This line now hangs. I've experienced this before when trying to read in the filled data. I get the same results when creating a plane. Checking top, the process is sitting in a D uninterruptible sleep state, with the memory usage slowly climbing. Is this expected? Or, would you expect making a plane and an operation like cube.unmasked_data[i] * 1 to take the same amount of time?

keflavich commented 4 years ago

"Slower, but not a whole lot more, given the data is being read into memory." It looks like it was about 200x slower! cube.unmasked_data[i] took 2.3x10^-5s, cube.subcube and cube[i:i+1,...] took 5x10^-3s, and cube.unmasked_data[i]*1 took 1.5s.

cube.filled_data[i] and cube.unmasked_data[i]*1 should typically take the same amount of time.

Since each slice is somewhere between 0.5-1 GB, you might have run out of memory entirely during your %timeit run, though... I wouldn't expect that...

radio-astro-tools / spectral-cube

Reading masked data is slow #610