knossos-project / knossos_utils

Python library for interacting with KNOSSOS data sets and annotation files
GNU General Public License v2.0
9 stars 9 forks source link

Qs: parallel reading #5

Open raacampbell opened 8 years ago

raacampbell commented 8 years ago

I think knossos_cuber is reading files in series. With data on a RAID volume, reading may be sped up substantially when done in parallel. Might be worth adding the option for this.

jmrk84 commented 8 years ago

Reading the tif stack is indeed serial I think, but for more or less good reason. For reasonably large tif/image source files, there should be no real advantage of doing this in parallel, because it should be mainly I/O limited instead of CPU limited. If the image files use some expensive compression, this might actually be different. Writing the cubes is then already heavily parallelized, as well as all further operations of the cuber. Do you use compressed input image files?

raacampbell commented 8 years ago

I use uncompressed TIFFs of about 100 to 200 MB each. I find reading in parallel is much faster if I'm working from a RAID volume. I use btrfs RAID1 and read with about 1 to 2 threads per drive. I've seen similar results with hardware RAID 1+0, but in that case the optimum speed was at 1 thread per drive.

Fresh benchmarks:

Hardware 8x 4TB btrfs RAID 1; Intel i7 with 8 cores; 64 GB RAM

I read 484 uncompressed TIFFs each of size 201MB. The system cache is cleared before each run.

  1. Serial read - 1041 seconds
  2. 8 threads - 281 seconds (3.7x)
  3. 16 threads - 231 seconds (4.5x)
jmrk84 commented 8 years ago

Thanks for looking into that, it looks like this could be optimized indeed! Could you make a pull request for your modifications that use many threads for reading into the numpy array?

raacampbell commented 8 years ago

Sorry, I wasn't modifying your code to generate those numbers. I just did a quick benchmark in MATLAB by reading in TIFF files with different number of workers using the Parallel Computing toolbox.

The benchmark was done with very low level code. i.e. I'm not even using MATLAB's tiff reader, I'm just using their basic fread command to read in the raw binary and then I reshape it to form a tiff. So it really is as simple as possible. I don't see why the speedup shouldn't hold for your code also, since I wasn't doing any decompression or other CPU-intensive steps: it's just IO.