intel / pcm

Intel® Performance Counter Monitor (Intel® PCM)
BSD 3-Clause "New" or "Revised" License
2.82k stars 476 forks source link

Double free #790

Closed artiomn closed 4 months ago

artiomn commented 4 months ago

I have a class to get power, consuming by core:

/**
 * @brief Power getter via Intel PCM.
 */
class KNP_DECLSPEC CpuPower
{
public:
    CpuPower();
    float get_power();

private:
//    std::chrono::time_point<std::chrono::steady_clock> time_start_;
//    pcm::PCM *pcm_instance_ = nullptr;
//    std::vector<pcm::SocketCounterState> sktstate1_, sktstate2_;
//    std::vector<pcm::CoreCounterState> /* cstates1, */ cstates2_;
    pcm::SystemCounterState sstate1_, sstate2_;
};

All code in methods are commented, all attributes, except sstates are commented. This class is a part of the library, statically linked to the bare gtest tester. Tester code:

int main(int argc, char *argv[])
{
return 0;
//    spdlog::set_level(spdlog::level::trace);

//    testing::InitGoogleTest(&argc, argv);
    try
    {
        //return RUN_ALL_TESTS();
    }
    catch (const std::runtime_error &e)
    {
        std::cerr << e.what() << std::endl;
        throw;
    }

    return EXIT_SUCCESS;
}

Running tester gives double free:

➭ ./build/bin/knp-tester --gtest_filter=BuildTest.\*
free(): double free detected in tcache 2
[1]    2883115 IOT instruction (core dumped)  ./build/bin/knp-tester --gtest_filter=BuildTest.\*

After commenting sstates (even if other attributes are uncommented), all works:

➭ ./build/bin/knp-tester --gtest_filter=BuildTest.\*

This error exists only on Linux, not on Windows.

Valgrind output:

➭ valgrind ./build/bin/knp-tester --gtest_filter=BuildTest.\*
==2902827== Memcheck, a memory error detector
==2902827== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==2902827== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==2902827== Command: ./build/bin/knp-tester --gtest_filter=BuildTest.*
==2902827== 
==2902827== Invalid free() / delete / delete[] / realloc()
==2902827==    at 0x4846404: operator delete(void*) (vg_replace_malloc.c:1131)
==2902827==    by 0x92536D: std::__new_allocator<char const*>::deallocate(char const**, unsigned long) (new_allocator.h:172)
==2902827==    by 0x922957: deallocate (alloc_traits.h:513)
==2902827==    by 0x922957: std::_Vector_base<char const*, std::allocator<char const*> >::_M_deallocate(char const**, unsigned long) (stl_vector.h:389)
==2902827==    by 0x921463: std::_Vector_base<char const*, std::allocator<char const*> >::~_Vector_base() (stl_vector.h:368)
==2902827==    by 0x920142: std::vector<char const*, std::allocator<char const*> >::~vector() (stl_vector.h:738)
==2902827==    by 0x5AF1A5F: __cxa_finalize (cxa_finalize.c:82)
==2902827==    by 0x5F2C267: ??? (in /home/artiom/projects/KNP/build/bin/libknp-devices-cpu.so.1.0.0)
==2902827==    by 0x40010F1: _dl_call_fini (dl-call_fini.c:43)
==2902827==    by 0x400511D: _dl_fini (dl-fini.c:114)
==2902827==    by 0x5AF1FA0: __run_exit_handlers (exit.c:108)
==2902827==    by 0x5AF206D: exit (exit.c:138)
==2902827==    by 0x5AD8C8E: (below main) (libc_start_call_main.h:74)
==2902827==  Address 0x664e1a0 is 0 bytes inside a block of size 80 free'd
==2902827==    at 0x4846404: operator delete(void*) (vg_replace_malloc.c:1131)
==2902827==    by 0x92536D: std::__new_allocator<char const*>::deallocate(char const**, unsigned long) (new_allocator.h:172)
==2902827==    by 0x922957: deallocate (alloc_traits.h:513)
==2902827==    by 0x922957: std::_Vector_base<char const*, std::allocator<char const*> >::_M_deallocate(char const**, unsigned long) (stl_vector.h:389)
==2902827==    by 0x921463: std::_Vector_base<char const*, std::allocator<char const*> >::~_Vector_base() (stl_vector.h:368)
==2902827==    by 0x920142: std::vector<char const*, std::allocator<char const*> >::~vector() (stl_vector.h:738)
==2902827==    by 0x5AF1FA0: __run_exit_handlers (exit.c:108)
==2902827==    by 0x5AF206D: exit (exit.c:138)
==2902827==    by 0x5AD8C8E: (below main) (libc_start_call_main.h:74)
==2902827==  Block was alloc'd at
==2902827==    at 0x4842F73: operator new(unsigned long) (vg_replace_malloc.c:487)
==2902827==    by 0x9253D5: std::__new_allocator<char const*>::allocate(unsigned long, void const*) (new_allocator.h:151)
==2902827==    by 0x922A02: allocate (alloc_traits.h:478)
==2902827==    by 0x922A02: std::_Vector_base<char const*, std::allocator<char const*> >::_M_allocate(unsigned long) (stl_vector.h:380)
==2902827==    by 0x9214E7: void std::vector<char const*, std::allocator<char const*> >::_M_range_initialize<char const* const*>(char const* const*, char const* const*, std::forward_iterator_tag) (stl_vector.h:1694)
==2902827==    by 0x9200CC: std::vector<char const*, std::allocator<char const*> >::vector(std::initializer_list<char const*>, std::allocator<char const*> const&) (stl_vector.h:682)
==2902827==    by 0x91FBAE: __static_initialization_and_destruction_0() (utils.cpp:139)
==2902827==    by 0x91FC0A: _GLOBAL__sub_I_utils.cpp (utils.cpp:1455)
==2902827==    by 0x5AD8DC3: call_init (libc-start.c:145)
==2902827==    by 0x5AD8DC3: __libc_start_main@@GLIBC_2.34 (libc-start.c:347)
==2902827==    by 0x34E2B4: (below main) (in /home/artiom/projects/KNP/build/bin/knp-tester)
==2902827== 
==2902827== 
==2902827== HEAP SUMMARY:
==2902827==     in use at exit: 80 bytes in 1 blocks
==2902827==   total heap usage: 2,250 allocs, 2,250 frees, 270,826 bytes allocated
==2902827== 
==2902827== LEAK SUMMARY:
==2902827==    definitely lost: 80 bytes in 1 blocks
==2902827==    indirectly lost: 0 bytes in 0 blocks
==2902827==      possibly lost: 0 bytes in 0 blocks
==2902827==    still reachable: 0 bytes in 0 blocks
==2902827==         suppressed: 0 bytes in 0 blocks
==2902827== Rerun with --leak-check=full to see details of leaked memory
==2902827== 
==2902827== For lists of detected and suppressed errors, rerun with: -s
==2902827== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

This problem doesn't exist in PCM 202403.

rdementi commented 4 months ago

In your code I don't see that any instance of CpuPower/SystemCounterState was ever created.

==2902827==    by 0x91FBAE: __static_initialization_and_destruction_0() (utils.cpp:139)

This allocation is here: https://github.com/intel/pcm/blob/38aaf6c730345df195430bc0082a16787da2caf2/src/utils.cpp#L139 Not really related to SystemCounterState

Could be a valgrind issue?

artiomn commented 4 months ago

Yes. Because even an object of this class has not been created. But the problem only happened when I linked the PCM to the library (and in my library even memory is not allocated or freed explicitly, and nothing called from it (temporarily commented)).

I tried to uncomment other attributes and comment sstates. And sometimes it repeats.

Probably this is happening in some PCM finalization code that is called when the library is unloaded?

Could be a valgrind issue?

Definitely not, because I see double free without Valgrind.

Stack:


Thread 1 (Thread 0x7fc8ea43a500 (LWP 3645867) "knp-tester"):
#0  0x00007fc8ea590e44 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007fc8ea538a30 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00007fc8ea5204c3 in abort () from /usr/lib/libc.so.6
No symbol table info available.
#3  0x00007fc8ea521354 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#4  0x00007fc8ea59b085 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007fc8ea59d66f in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#6  0x00007fc8ea59fdae in free () from /usr/lib/libc.so.6
No symbol table info available.
#7  0x000055bfe327d426 in std::__new_allocator<char const*>::deallocate (this=0x55bfe34c8a40 <pcm::colorTable>, __p=0x55bfe4e026d0, __n=10) at /usr/include/c++/14.1.1/bits/new_allocator.h:172
No locals.
#8  0x000055bfe327aa10 in std::allocator_traits<std::allocator<char const*> >::deallocate (__a=..., __p=0x55bfe4e026d0, __n=10) at /usr/include/c++/14.1.1/bits/alloc_traits.h:513
No locals.
#9  std::_Vector_base<char const*, std::allocator<char const*> >::_M_deallocate (this=0x55bfe34c8a40 <pcm::colorTable>, __p=0x55bfe4e026d0, __n=10) at /usr/include/c++/14.1.1/bits/stl_vector.h:389
No locals.
#10 0x000055bfe327951c in std::_Vector_base<char const*, std::allocator<char const*> >::~_Vector_base (this=0x55bfe34c8a40 <pcm::colorTable>, __in_chrg=<optimized out>) at /usr/include/c++/14.1.1/bits/stl_vector.h:368
No locals.
#11 0x000055bfe32781fb in std::vector<char const*, std::allocator<char const*> >::~vector (this=0x55bfe34c8a40 <pcm::colorTable>, __in_chrg=<optimized out>) at /usr/include/c++/14.1.1/bits/stl_vector.h:738
No locals.
#12 0x00007fc8ea53aa60 in __cxa_finalize () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007fc8ea258298 in ?? () from /home/artiom/projects/KNP/build/linux-default/bin/libknp-devices-cpu.so.1
No symbol table info available.
#14 0x00007ffeb32dc540 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Compiler: gcc-14.1.1, C++17.

artiomn commented 4 months ago

Aha, got it!

utils.cpp:128:

std::vector<const char *> colorTable = {
        ASCII_GREEN,
        ASCII_YELLOW,
        ASCII_MAGENTA,
        ASCII_CYAN,
        ASCII_BRIGHT_GREEN,
        ASCII_BRIGHT_YELLOW,
        ASCII_BRIGHT_BLUE,
        ASCII_BRIGHT_MAGENTA,
        ASCII_BRIGHT_CYAN,
        ASCII_BRIGHT_WHITE
};

Problem happened, when the vector destructor called. Probably destructor trying to delete constexpr global string by the pointer. But that's just a guess.

artiomn commented 4 months ago

It can be fixed by replacing vector with array:

constexpr static std::array<const char *, 10> colorTable = {
        ASCII_GREEN,
        ASCII_YELLOW,
        ASCII_MAGENTA,
        ASCII_CYAN,
        ASCII_BRIGHT_GREEN,
        ASCII_BRIGHT_YELLOW,
        ASCII_BRIGHT_BLUE,
        ASCII_BRIGHT_MAGENTA,
        ASCII_BRIGHT_CYAN,
        ASCII_BRIGHT_WHITE
};
rdementi commented 4 months ago

thanks for the valgrind W/A.

Definitely not, because I see double free without Valgrind.

Unfortunately I still don't see why colorTable destructor needs to be called twice by __run_exit_handlers. What is wrong in having static std::vector arrays?

artiomn commented 4 months ago

thanks for the valgrind W/A.

Definitely not, because I see double free without Valgrind.

Unfortunately I still don't see why colorTable destructor needs to be called twice by __run_exit_handlers. What is wrong in having static std::vector arrays?

See above. Probably vector destructor trying to remove constexpr string by the pointer (delete ASCII_GREEN;). There are no errors with std::array.

rdementi commented 4 months ago

I thought ASCII_GREEN itself is not allocated on the heap. It is a constexpr already

constexpr const char* ASCII_GREEN = "\033[0;32m";

And no delete should be called for it. Only for the std vector array that holds the pointers to those constexpr char *. Assuming delete is called when they are in std::vector why it is not called when these pointers are stored in a different container "std::array" instead of "std::vector"?

Does it also report a leak if you put a vector with a different type (int instead of char *):

std::vector intTable = {1,2,3,4,5,6,7,8,9,10};

artiomn commented 4 months ago

I thought ASCII_GREEN itself is not allocated on the heap. It is a constexpr already

Yes.

And no delete should be called for it. Only for the std vector array that holds the pointers to those constexpr char *.

Yes.

why it is not called when these pointers are stored in a different container "std::array" instead of "std::vector"?

Because this container is constexpr, but the vector is not and frees memory (from .rodata?) in runtime.

Does it also report a leak if you put a vector with a different type (int instead of char *):

This is not a leak, this is a double free: deallocating already deallocated memory. And no, in this case problem doesn't exists (I haven't checked, but I think not).

Strictly speaking, a vector does not free memory. This is done by an allocator, the class of which is passed as a template parameter. IMHO, allocator for the char* is different, but I'm not sure: I don't remember and really don't have time to investigate it.

As a fact: with constexpr array the double free problem was gone.

rdementi commented 4 months ago

thanks.. I meant double-free, not a leak. As a developer one can expect that if one writes something like that: std::vector<const char *> arrayOfStrings = {"AA","BB"}; this won't result in double-free :-) Thanks for spending time in debugging this.

Anyway, if you have more time a pull request would be welcome.

artiomn commented 4 months ago

With vector probably my explanation is not correct. But problem is definitely exists.

My idea was that the line is deleted in runtime, and then automatically when the application is unloaded.

This did not happen with std::array, not because it is a different container, but because it is a constexpr container (if you can make constexpr vector in C++17, I think it will be good too).

artiomn commented 4 months ago

Yes, probably, you're right. Strings will not be deleted by the pointer. It's my failure.

artiomn commented 4 months ago

But double-free exists somewhere here.