shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

Error freeing memory LibSVM when exiting sample application #5115

Open spiovesan opened 4 years ago

spiovesan commented 4 years ago

I build shogun master on Windows 10 x64, VisualStudio 2019. I built the sample classifier_minimal_svm, it works but I get this error exiting the application

Critical error detected c0000374
classifier_minimal_svm.exe has triggered a breakpoint.

Exception thrown at 0x00007FFC395DB0B9 (ntdll.dll) in classifier_minimal_svm.exe: 0xC0000374: A heap has been corrupted 
(parameters: 0x00007FFC396427F0).
Unhandled exception at 0x00007FFC395DB0B9 (ntdll.dll) in classifier_minimal_svm.exe: 0xC0000374: A heap has been corrupted (parameters: 0x00007FFC396427F0).

This is the stack trace:

ntdll.dll!00007ffc395db0b9()    Unknown
ntdll.dll!00007ffc395db083()    Unknown
ntdll.dll!00007ffc395e390e()    Unknown
ntdll.dll!00007ffc395e3c1a()    Unknown
ntdll.dll!00007ffc3957ecb1()    Unknown
ntdll.dll!00007ffc3958ce62()    Unknown
ucrtbase.dll!00007ffc357ec7eb() Unknown
classifier_minimal_svm.exe!shogun::sg_free(void * ptr) Line 186 C++
classifier_minimal_svm.exe!shogun::sg_generic_free<int,0>(int * ptr) Line 124   C++
classifier_minimal_svm.exe!shogun::SGVector<int>::free_data() Line 405  C++
classifier_minimal_svm.exe!shogun::SGReferencedData::unref() Line 102   C++
classifier_minimal_svm.exe!shogun::SGVector<int>::~SGVector<int>() Line 173 C++
classifier_minimal_svm.exe!shogun::KernelMachine::~KernelMachine() Line 79  C++
classifier_minimal_svm.exe!shogun::SVM::~SVM() Line 40  C++
classifier_minimal_svm.exe!shogun::LibSVM::~LibSVM() Line 37    C++
classifier_minimal_svm.exe!shogun::LibSVM::`scalar deleting destructor'(unsigned int)   C++
classifier_minimal_svm.exe!std::_Destroy_in_place<shogun::LibSVM>(shogun::LibSVM & _Obj) Line 269   C++
classifier_minimal_svm.exe!std::_Ref_count_obj2<shogun::LibSVM>::_Destroy() Line 1446   C++
classifier_minimal_svm.exe!std::_Ref_count_base::_Decref() Line 542 C++
classifier_minimal_svm.exe!std::_Ptr_base<shogun::LibSVM>::_Decref() Line 776   C++
classifier_minimal_svm.exe!std::shared_ptr<shogun::LibSVM>::~shared_ptr<shogun::LibSVM>() Line 1034 C++
classifier_minimal_svm.exe!main(int argc, char * * argv) Line 41    C++
[Inline Frame] classifier_minimal_svm.exe!invoke_main() Line 78 C++
classifier_minimal_svm.exe!__scrt_common_main_seh() Line 288    C++

I see in previous release there was this line of code now removed

// free up memory
SG_UNREF(svm);
spiovesan commented 4 years ago

I found that it is not a svm issue: every time a SGVector goes out of scope there is this error

{
  SGVector<float64_t> y_values(100); 
}

linreg_shogun.exe!shogun::sg_free(void * ptr) Line 186  C++
linreg_shogun.exe!shogun::sg_generic_free<double,0>(double * ptr) Line 124  C++
linreg_shogun.exe!shogun::SGVector<double>::free_data() Line 405    C++
linreg_shogun.exe!shogun::SGReferencedData::unref() Line 102    C++
linreg_shogun.exe!shogun::SGVector<double>::~SGVector<double>() Line 173    C++
vigsterkr commented 4 years ago

sorry for the late reply! first of all what is the version you are trying to use? because in develop all the SG_UNREF calls are gone since we switched to smart pointers, so no need for those...

i had problems in the past to run shogun on windows, could you please share more info about your development environment?

SGVector and SGMatrix are autoreferenced, so if they go out of scope and nobody holds reference to it, it'll be destructed.

btw: if you are around irc/gitter/matrix you can ping us there and have a sync discussion about this

spiovesan commented 4 years ago

I am using Microsoft Visual Studio Enterprise 2019 Version 16.7.1 x64, Windows SDK 10.0.18362.0, using CMAKE generated projects. Shogun toolbox is the master branch downloaded the 19 august.

Shogun project preprocessor defines are> EIGEN_NO_STATIC_ASSERT;DEBUG;CMAKE_INTDIR="Debug";%(PreprocessorDefinitions) Sample project: WIN32;_WINDOWS;DEBUG;_ENABLE_EXTENDED_ALIGNED_STORAGE;CMAKE_INTDIR="Debug";%(PreprocessorDefinitions)

The same issue happens with the SGMatrixclass too. I suspect something wrong in the memory.cpp Maybe can help have a look to the config.h file.

karlnapf commented 4 years ago

try the develop branch instead! :)

vigsterkr commented 4 years ago

@karlnapf unfortunately that will not work....

vigsterkr commented 4 years ago

@spiovesan i'm a bit puzzled... if you use master, how come you dont have SG_REF/UNREF calls in your code?

spiovesan commented 4 years ago

Sorry my fault, It was the develop branch (I got it with git clone https://github.com/shogun-toolbox/shogun.git)

vigsterkr commented 4 years ago

@spiovesan oh ok that is then rather interesting, coz as said i had trouble running code itself... the classifier_minimal_svm exit code error i might have an idea why that actually happens...

in case of the example you've given:

{
 SGVector<float64_t> y_values(100); 
}

this is basically a simple function you've created and called? and once the function returns you get that error?

spiovesan commented 4 years ago

I deleted everything from that sample and wrote these few lines to verify that the issue is not related to anything else:

#include <shogun/lib/SGVector.h>
#include <shogun/lib/SGMatrix.h>

using namespace shogun;

int main(int argc, char** argv)
{
    {
        SGMatrix<float64_t> x_values(1, 64);
    }

    {
        SGVector<float64_t> v_values1(64);
    }
    return 0;
}

When the variables go out of scope leaving the brackets, you get the error in the free(ptr) call.

vigsterkr commented 4 years ago

ooooh 🐙 thnx for this! it's super helpful! are you sure that it's calling free(ptr)? because then i know why that is happening :)

spiovesan commented 4 years ago

This is the stack trace

    ntdll.dll!00007fff3931b042()    Unknown
    ntdll.dll!00007fff3932390e()    Unknown
    ntdll.dll!00007fff39323c1a()    Unknown
    ntdll.dll!00007fff392becb1()    Unknown
    ntdll.dll!00007fff392cce62()    Unknown
    ucrtbase.dll!00007fff353ac7eb() Unknown
>   sgvector_minimal.exe!shogun::sg_free(void * ptr) Line 186   C++
    sgvector_minimal.exe!shogun::sg_generic_free<double,0>(double * ptr) Line 124   C++
    sgvector_minimal.exe!shogun::SGMatrix<double>::free_data() Line 959 C++
    sgvector_minimal.exe!shogun::SGReferencedData::unref() Line 102 C++
    sgvector_minimal.exe!shogun::SGMatrix<double>::~SGMatrix<double>() Line 182 C++
    sgvector_minimal.exe!main(int argc, char * * argv) Line 13  C++
    [Inline Frame] sgvector_minimal.exe!invoke_main() Line 78   C++
    sgvector_minimal.exe!__scrt_common_main_seh() Line 288  C++
    kernel32.dll!00007fff36ad7974() Unknown
    ntdll.dll!00007fff3928a271()    Unknown

and the debugger stops here, where line 186 is last brace in src\shogun\lib\memory.cpp:

void  sg_free(void* ptr)
{
#if defined(USE_JEMALLOC)
    je_free(ptr);
#elif defined(USE_TCMALLOC)
    tc_free(ptr);
#else
    free(ptr);
#endif
}

USE_JEMALLOCand USE_TCMALLOCare undefined and thenfree(ptr)is used

vigsterkr commented 4 years ago

so the solution is here: https://github.com/shogun-toolbox/shogun/pull/4927 basically the story is that the free is not the right call to deallocate aligned memory on windows

spiovesan commented 4 years ago

I manually changed the my cloned develop branch with your 868a7ab commit and it works. Waiting for the merge into the toolbox.

vigsterkr commented 4 years ago

@spiovesan that's great news! thnx a lot for this feedback!

spiovesan commented 4 years ago

I found a new issue in the classifier_mklmulticlass.cpp sample. It gets some weights from here at line 294

float64_t* weights=tsvm->getsubkernelweights(numweights);

but that function allocates memory data in mkl\MKLMulticlass.cpp at line 436 with a new

float64_t* res=new float64_t[numweights];

then with the changes in SGVector.cpp we did

void SGVector<T>::free_data()
{
    SG_ALIGNED_FREE(vector);
}

the dtor fails to free the newallocated data. I could replace here with a SG_ALIGNED_MALLOC, but somewhere else someone could use a delete and that will fail. I suspect that maybe it is not the only issue like that. How to safely fix?