keroro824 / HashingDeepLearning

Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"
MIT License
1.08k stars 169 forks source link

Segfault #6

Open wrathematics opened 4 years ago

wrathematics commented 4 years ago

This looks very interesting, but I'm unable to successfully run the example. I modified the trainData, testData, and logFile paths of the config file appropriately. Valgrind reports this:

==12259== Memcheck, a memory error detector
==12259== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12259== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12259== Command: ./runme Config_amz.csv
==12259== 
new Network
==12259== Invalid write of size 4
==12259==    at 0x10F124: Network::Network(int*, NodeType*, int, int, float, int, int*, int*, int*, float*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, cnpy::NpyArray, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cnpy::NpyArray> > >) (Network.cpp:13)
==12259==    by 0x10D01E: main (main.cpp:472)
==12259==  Address 0xb is not stack'd, malloc'd or (recently) free'd
keroro824 commented 4 years ago

Apologize for the late reply! The team just got back from the conference. We'd happy to help. As mentioned in the README: "Additionally, Transparent Huge Pages must be enabled. SLIDE requires approximately 900 2MB pages, and 10 1GB pages. (https://wiki.debian.org/Hugepages)"

May I ask if you that is enabled?

wrathematics commented 4 years ago

No worries. I wouldn't call your reply late at all!

I saw the THP requirement in the readme, but I'm not familiar with it so I'm not really sure. I see this:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

Reading man 2 madvise, it's not clear to me exactly how this works. But I also get the segfault if I remove usage of MAP_HUGETLB and MAP_HUGE_SHIFT from Layer.h, Network.h, and Node.h.

keroro824 commented 4 years ago

In order to omit the possibility that THP caused the problem, can you reverse to this https://github.com/keroro824/HashingDeepLearning/commit/2d10d46b5f6f1eda5d19f27038a596446fc17cee commit?

From my experience, three things can cause Segfault:

  1. data path is incorrect
  2. TPH has problems
  3. Hashtable/hash function parameters are not matching

If you believe the Config file has no problem, it has to be THP.

jlopezNEU commented 4 years ago

TL;DR on my Ubuntu 19.10 machine I added: transparent_hugepage=always hugepagesz=1GB hugepages=10 hugepagesz=2MB hugepages=900 to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub. Then update-grub and reboot. You can verify with cat /sys/kernel/mm/transparent_hugepage/enabled and hugeadm --pool-list.

yvdriess commented 4 years ago

Note: mmap returns -1 on error, the current code in Node, Layers, etc. checks for NULL:

https://github.com/keroro824/HashingDeepLearning/blob/b7b3bba9bab1fb0e70de7644daa9e2312f21de82/SLIDE/Layer.h#L58

xman commented 4 years ago

@yvdriess This should be resolved in the pull request #22.

jlopezNEU commented 4 years ago

Hi, Im sorry but I don't know why you are missing this file if you have a recent Ubuntu. Perhaps running the update grub command would restore it. Best regards, Jose


From: One-punch24 notifications@github.com Sent: Monday, November 2, 2020 7:26 PM To: keroro824/HashingDeepLearning HashingDeepLearning@noreply.github.com Cc: Jose Lopez lopez.jo@northeastern.edu; Comment comment@noreply.github.com Subject: Re: [keroro824/HashingDeepLearning] Segfault (#6)

TL;DR on my Ubuntu 19.10 machine I added: transparent_hugepage=always hugepagesz=1GB hugepages=10 hugepagesz=2MB hugepages=900 to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub. Then update-grub and reboot. You can verify with cat /sys/kernel/mm/transparent_hugepage/enabled and hugeadm --pool-list.

so sorry to interrupt, but I could not find the grub file, should I create one in the location /etc/default/grub ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkeroro824%2FHashingDeepLearning%2Fissues%2F6%23issuecomment-720884472&data=04%7C01%7Clopez.jo%40northeastern.edu%7C558d265d7a3545a9649008d87fa84628%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637399707993220673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=a4g8w7krqWS0fJa%2F5rcswaG9aY%2Bp4WPcz6LXbF8poYs%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE3WNBV7HK555A4HKBRE333SN5Z63ANCNFSM4K75GGGA&data=04%7C01%7Clopez.jo%40northeastern.edu%7C558d265d7a3545a9649008d87fa84628%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637399707993220673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5umWRaYJubP%2B%2BZ6wDWVPNs3EfA9Cfp4yvO4BBtxIWIY%3D&reserved=0.

One-punch24 commented 4 years ago

Thank you so much for replaying. I am a little bit busy these days for some exams.  Can I consult you later? I was so happy and amazed that you answer me so 

------------------ 原始邮件 ------------------ 发件人: "keroro824/HashingDeepLearning" <notifications@github.com>; 发送时间: 2020年11月10日(星期二) 凌晨0:12 收件人: "keroro824/HashingDeepLearning"<HashingDeepLearning@noreply.github.com>; 抄送: "1482806551"<1482806551@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [keroro824/HashingDeepLearning] Segfault (#6)

Hi, Im sorry but I don't know why you are missing this file if you have a recent Ubuntu. Perhaps running the update grub command would restore it. Best regards, Jose


From: One-punch24 <notifications@github.com> Sent: Monday, November 2, 2020 7:26 PM To: keroro824/HashingDeepLearning <HashingDeepLearning@noreply.github.com> Cc: Jose Lopez <lopez.jo@northeastern.edu>; Comment <comment@noreply.github.com> Subject: Re: [keroro824/HashingDeepLearning] Segfault (#6)

TL;DR on my Ubuntu 19.10 machine I added: transparent_hugepage=always hugepagesz=1GB hugepages=10 hugepagesz=2MB hugepages=900 to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub. Then update-grub and reboot. You can verify with cat /sys/kernel/mm/transparent_hugepage/enabled and hugeadm --pool-list.

so sorry to interrupt, but I could not find the grub file, should I create one in the location /etc/default/grub ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkeroro824%2FHashingDeepLearning%2Fissues%2F6%23issuecomment-720884472&amp;data=04%7C01%7Clopez.jo%40northeastern.edu%7C558d265d7a3545a9649008d87fa84628%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637399707993220673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a4g8w7krqWS0fJa%2F5rcswaG9aY%2Bp4WPcz6LXbF8poYs%3D&amp;reserved=0&gt;, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE3WNBV7HK555A4HKBRE333SN5Z63ANCNFSM4K75GGGA&amp;data=04%7C01%7Clopez.jo%40northeastern.edu%7C558d265d7a3545a9649008d87fa84628%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637399707993220673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5umWRaYJubP%2B%2BZ6wDWVPNs3EfA9Cfp4yvO4BBtxIWIY%3D&amp;reserved=0&gt;.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

One-punch24 commented 4 years ago

I seemed to have solved this problem but still got segment fault. I really want to consult you later to solve some strange problem.

Thank you for your help again! You are a nice person treating my problem so seriously.

------------------ 原始邮件 ------------------ 发件人: "keroro824/HashingDeepLearning" <notifications@github.com>; 发送时间: 2020年11月10日(星期二) 凌晨0:12 收件人: "keroro824/HashingDeepLearning"<HashingDeepLearning@noreply.github.com>; 抄送: "1482806551"<1482806551@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [keroro824/HashingDeepLearning] Segfault (#6)

Hi, Im sorry but I don't know why you are missing this file if you have a recent Ubuntu. Perhaps running the update grub command would restore it. Best regards, Jose


From: One-punch24 <notifications@github.com> Sent: Monday, November 2, 2020 7:26 PM To: keroro824/HashingDeepLearning <HashingDeepLearning@noreply.github.com> Cc: Jose Lopez <lopez.jo@northeastern.edu>; Comment <comment@noreply.github.com> Subject: Re: [keroro824/HashingDeepLearning] Segfault (#6)

TL;DR on my Ubuntu 19.10 machine I added: transparent_hugepage=always hugepagesz=1GB hugepages=10 hugepagesz=2MB hugepages=900 to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub. Then update-grub and reboot. You can verify with cat /sys/kernel/mm/transparent_hugepage/enabled and hugeadm --pool-list.

so sorry to interrupt, but I could not find the grub file, should I create one in the location /etc/default/grub ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkeroro824%2FHashingDeepLearning%2Fissues%2F6%23issuecomment-720884472&amp;data=04%7C01%7Clopez.jo%40northeastern.edu%7C558d265d7a3545a9649008d87fa84628%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637399707993220673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a4g8w7krqWS0fJa%2F5rcswaG9aY%2Bp4WPcz6LXbF8poYs%3D&amp;reserved=0&gt;, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE3WNBV7HK555A4HKBRE333SN5Z63ANCNFSM4K75GGGA&amp;data=04%7C01%7Clopez.jo%40northeastern.edu%7C558d265d7a3545a9649008d87fa84628%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637399707993220673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5umWRaYJubP%2B%2BZ6wDWVPNs3EfA9Cfp4yvO4BBtxIWIY%3D&amp;reserved=0&gt;.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Eslam2011 commented 3 years ago

Is code run, I have killed problem ???

Eslam2011 commented 2 years ago

@jlopezNEU Is code work now or not?