keroro824 / HashingDeepLearning

Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"
MIT License
1.07k stars 169 forks source link

Segmentation fault #18

Closed klm122 closed 4 years ago

klm122 commented 4 years ago

When trying to run the program runme I have a degmentation fault. I will be grateful if you could provide solution hints.

This is valgrind output:

valgrind -s --leak-check=full ./runme Config_amz.csv ==26728== Memcheck, a memory error detector ==26728== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==26728== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==26728== Command: ./runme Config_amz.csv ==26728== new Network ==26728== Invalid write of size 4 ==26728== at 0x40F9EB: Network::Network(int, NodeType, int, int, float, int, int, int, int, float, std::map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, cnpy::NpyArray, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, cnpy::NpyArray> > >) (Network.cpp:15) ==26728== by 0x403936: main (main.cpp:472) ==26728== Address 0xb is not stack'd, malloc'd or (recently) free'd ==26728== ==26728== ==26728== Process terminating with default action of signal 11 (SIGSEGV) ==26728== Access not within mapped region at address 0xB ==26728== at 0x40F9EB: Network::Network(int, NodeType, int, int, float, int, int, int, int, float, std::map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, cnpy::NpyArray, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, cnpy::NpyArray> > >) (Network.cpp:15) ==26728== by 0x403936: main (main.cpp:472) ==26728== If you believe this happened as a result of a stack ==26728== overflow in your program's main thread (unlikely but ==26728== possible), you can try to increase the size of the ==26728== main thread stack using the --main-stacksize= flag. ==26728== The main thread stack size used in this run was 8388608. ==26728== ==26728== HEAP SUMMARY: ==26728== in use at exit: 246 bytes in 12 blocks ==26728== total heap usage: 37 allocs, 25 frees, 115,114 bytes allocated ==26728== ==26728== LEAK SUMMARY: ==26728== definitely lost: 0 bytes in 0 blocks ==26728== indirectly lost: 0 bytes in 0 blocks ==26728== possibly lost: 0 bytes in 0 blocks ==26728== still reachable: 246 bytes in 12 blocks ==26728== suppressed: 0 bytes in 0 blocks ==26728== Reachable blocks (those to which a pointer was found) are not shown. ==26728== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==26728== ==26728== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ==26728== ==26728== 1 errors in context 1 of 1: ==26728== Invalid write of size 4 ==26728== at 0x40F9EB: Network::Network(int, NodeType, int, int, float, int, int, int, int, float, std::map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, cnpy::NpyArray, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, cnpy::NpyArray> > >) (Network.cpp:15) ==26728== by 0x403936: main (main.cpp:472) ==26728== Address 0xb is not stack'd, malloc'd or (recently) free'd ==26728== ==26728== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Erreur de segmentation (core dumped)

min-xu-ai commented 4 years ago

Likely you don't have huge page enabled. Maybe the code should detect that and dynamic to switch to not using huge page tables when that fails and print a warning of performance drop and move on.

rahulunair commented 4 years ago

check if huge pages are supported in hardware using:

grep pse /proc/cpuinfo | uniq
grep pdpe1gb /proc/cpuinfo | uniq

If you get a non empty string, that means both 2 MB and 1 GB pages are supported:

To temporarily allocate pages you can use sysctl like:

sysctl -w vm.nr_hugepages=4096

To enable hugepages add this line to your grub conf:

 transparent_hugepage=always hugepagesz=1GB hugepages=10 hugepagesz=2MB hugepages=900

Update and reboot, check this for more info: https://wiki.debian.org/Hugepages#x86_64

klm122 commented 4 years ago

Thanks for your responsiveness.