tbeu / matio

MATLAB MAT File I/O Library
https://matio.sourceforge.io
BSD 2-Clause "Simplified" License
334 stars 97 forks source link

MATIO Perfomance compared to matlab libs #65

Open JonatanTingstrom opened 7 years ago

JonatanTingstrom commented 7 years ago

Hi, I am wondering how the perfomance of MATIO is compared to the standard matlab libraries. I need to read a lot of data from multiple .mat files and also write alot of data so it is of high importance that it can be done quickly. So I tried comparing MATIO with Matlab 2015b libraries. Unfortunately MATIO was much slower (when Matlab libs took 60s to read a bunch of files it took almost 400s for MATIO).

But I don't know if I that have compiled MATIO with settings that caused it to be much slower or if my benchmarking software has some bugs in it. Is there a big performance difference or have I just done something wrong on my end?

tbeu commented 7 years ago

I am very interested in such a performance benchmark. Do you think you can share your code and the MAT file you were using such that I can try to reproduce?

JonatanTingstrom commented 7 years ago

Sure thing. No work has been put in to making the code clean however, it was just meant to be a quick comparison between the two libraries. The code was written only to work in windows env with visual studio. Also good to know that I had the option "Character set" set to "Use Multi-Byte Character Set".

I build two seperate .exe in Visual Studio, one that includes the matio libs and headers and one that include the matlab libs and headers.

Libs added to project settings was libmx.lib and libmat.lib for Matlab and libmatio.lib for the Matio project.

Include files used in both projects:

#include <iostream>
#include <windows.h>
#include <ctime>

And then also #include "mat.h" for Matlab exe and #include "matio.h" for Matio exe.

Then I had the following code in the main function that choose folder, start timer and calls the ReadMatFile function. The main function was identical for the two exe.

int main()
{
    string pathToFiles = "c:\\rerun\\TwoLogs\\";
    string fileExtension = "*.mat";
    WIN32_FIND_DATA search_data;

    memset(&search_data, 0, sizeof(WIN32_FIND_DATA));

    HANDLE handle = FindFirstFile((pathToFiles+fileExtension).c_str(), &search_data);
    cout << "Tick!" << endl;

    clock_t startTime = clock();
    while (handle != INVALID_HANDLE_VALUE)
    {
        ReadMatFile((pathToFiles+search_data.cFileName).c_str());
        if (FindNextFile(handle, &search_data) == FALSE)
            break;
    }
    clock_t endTime = clock();
    cout << "Tock!" << endl;

    FindClose(handle);
    cout << "Done in: " << ((float)(endTime - startTime) / CLOCKS_PER_SEC) << endl;
    system("pause");
    return 0;
}

Then I had two different versions of the ReadMatFile, one that uses the Matlab commands and one that uses Matios commands.

Matlab:

void ReadMatFile(const char* file)
{
    MATFile *pmat;
    mxArray *pa;
    const char *name;
    int varCnt = 0;

    cout << "Try to read all variables in: " << file << endl;

    pmat = matOpen(file, "r");
    if (pmat == NULL) 
    {
        cout << "Failed to open!" << endl;
        return;
    }

    while ((pa = matGetNextVariable(pmat, &name)) != NULL) 
    {
        varCnt++;
        mxDestroyArray(pa);
    }

    if (matClose(pmat) != 0) 
    {
        cout << "Failed to close! " << endl;
        return;
    }

    cout << varCnt << " variables found and read in file..." << endl;
    return;
}

And Matio:

void ReadMatFile(const char* file)
{
    mat_t *pmat;
    matvar_t *pa;
    int varCnt = 0;

    cout << "Try to read all variables in: " << file << endl;

    pmat = Mat_Open(file, MAT_ACC_RDONLY);
    if (pmat == NULL) 
    {
        cout << "Failed to open!" << endl;
        return;
    }

    while ((pa = Mat_VarReadNext(pmat)) != NULL)
    {
        varCnt++;
        Mat_VarFree(pa);
    }

    if (Mat_Close(pmat) != 0) 
    {
        cout << "Failed to close! " << endl;
        return;
    }
    cout << varCnt << " variables found and read in file..." << endl;
    return;
}
tbeu commented 7 years ago

Thanks for the code snippets. Based on them I've created https://github.com/tbeu/matioPerformance which compiles with VS 2012 - the same VS version that the MATLAB R2015b libraries were built with. libmatio.dll was built from current master and tweaked to link with hdf5.lib v1.8.12 (the version required by MATLAB R2105b) and zlib1.lib v1.2.11.

I observe that matPerf.exe crawls the MAT-files in the data folder in about 1.2 seconds whereas matioPerf.exe needs about 8.4 seconds.

emmenlau commented 7 years ago

Very interesting! Please let me know if I can do something to help? We have files with relatively complex structures inside, is that relevant for the performance difference?

tbeu commented 7 years ago

I am expecting the three performance bottle-necks

tbeu commented 7 years ago

@emmenlau Well, you could run matPer/matioPerf on your files and try to get it down to a single struct.

On the other side I am not sure if matGetNextVariable and Mat_VarReadNext are really comparable. We could also try matGetNextVariableInfo and Mat_VarReadNextInfo to ignore the data I/O.

tbeu commented 7 years ago

https://github.com/tbeu/matioPerformance was updated

tbeu commented 6 years ago

@emmenlau Is there anything you figured out?

tbeu commented 10 months ago

@emmenlau FYI I updated https://github.com/tbeu/matioPerformance to the upcoming libmatio v1.5.24.

test_suites.zip from https://github.com/tbeu/matio/issues/157#issue-709222187 still is a performance bottle-neck.