wavescholar / klMatrixCore

Core Matrix Functionality. This repository is under active development.
1 stars 0 forks source link

Implement Fast IO For Matrices and Vectors #34

Closed wavescholar closed 10 years ago

wavescholar commented 11 years ago

aving a binary file is the fastest option. Not only you can read it directly in an array with a raw istream::read in a single operation (which is very fast), but you can even map the file in memory if your OS supports it; you can use open/mmap on POSIX systems, CreateFile/CreateFileMapping/MapViewOfFile on Windows, or even the Boost cross-platform solution (thanks @Cory Nelson for pointing it out).

Quick & dirty examples, assuming the file contains the raw representation of some floats:

"Normal" read:

include

include

// ...

// Open the stream std::ifstream is("input.dat"); // Determine the file length is.seekg(0, std:ios_base::end); std::size_t size=is.tellg(); is.seekg(0, std::ios_base::begin); // Create a vector to store the data std::vector v(size/sizeof(float)); // Load the data is.read((char*) &v[0], size); // Close the file is.close(); With shared memory:

include <boost/interprocess/file_mapping.hpp>

include <boost/interprocess/mapped_region.hpp>

using boost::interprocess;

// ....

// Create the file mapping file_mapping fm("input.dat", read_only); // Map the file in memory mapped_region region(fm, read_only); // Get the address where the file has been mapped float * addr = (float *)region.get_address(); std::size_t elements = region.get_size()/sizeof(float);

Your bottleneck is in the I/O. You want the program to read in as much data into memory in fewest I/O calls. For example reading 256 numbers with one fread is faster than 256 fread of one number.

If you can, format the data file to match the target platform's internal floating point representation, or at least your program's representation. This reduces the overhead of translating textual representation to internal representation.

Bypass the OS and use the DMA controller to read in the file data, if possible. The DMA chip takes the burden of reading data into memory off the shoulders of the processor.

Compact you data file. The data file wants to be in one contiguous set of sectors on the disk. This will reduce the amount of time spent seeking to different areas on the physical platters.

Have you program demand exclusive control over the disk resource and the processors. Block all other unimportant tasks; raise the priority of your program's execution.

Use multiple buffers to keep the disk drive spinning. A large portion of time is spent waiting for the hard drive to accelerate and decelerate. Your program can be processing the data while something else is storing the data into a buffer, which leads to ...

Multi-thread. Create one thread to read in the data and alert the processing task when the buffer is not empty.

These should keep you busy for a while. All other optimizations will result in negligible performance gains. (Such as accessing the hard drive controller directly to transfer into one of your buffers.)