Open outpaddling opened 1 year ago
Does this happen when you compile a fresh build with the cmake option -DENABLE_GC=OFF
? If the crash goes away, it might be caused by the GC library. It has problems with non-standard systems.
Disabling GC does get around the issue, thanks. What are the consequences of running without garbage collection, if any? Beyond that, getting a successful build was only a matter of two files missing sys/wait.h.
--- frontends/common/parser_options.cpp.orig 2023-04-25 13:01:16 UTC
+++ frontends/common/parser_options.cpp
@@ -21,6 +21,7 @@ limitations under the License.
#include <sys/stat.h>
#include <sys/types.h>
+#include <sys/wait.h>
#include <regex>
#include <unordered_set>
--- test/gtest/helpers.cpp.orig 2023-04-25 13:28:10 UTC
+++ test/gtest/helpers.cpp
@@ -19,6 +19,8 @@ limitations under the License.
#include <sstream>
#include <stdexcept>
+#include <sys/wait.h>
+
#include "helpers.h"
#include "frontends/common/applyOptionsPragmas.h"
My work-in-progress FreeBSD port is here in case someone is interested:
https://github.com/outpaddling/freebsd-ports-wip/tree/master/p4c
Disabling GC does get around the issue, thanks. What are the consequences of running without garbage collection, if any?
Well.. for larger programs you may run out of memory because all heap allocation is managed by the garbage collector. This compiler does not use any smart pointers. I believe the crash is fixable, but the garbage collection library (https://github.com/ivmai/bdwgc) is a bit finicky.
Beyond that, getting a successful build was only a matter of two files missing sys/wait.h.
Thanks for doing this! Could you file a PR for the includes? I can approve/merge it then.
We could add a weekly/nightly FreeBSD CI workflow to ensure that the compiler does compile and can compile programs. However, we have limited development resources and I am not sure if there is enough incentive to continuously fix issues on FreeBSD. We barely maintain our MacOS/Fedora builds.
Master branch? Which version of Clang?
Clang 12 should work (see quite related https://github.com/p4lang/p4c/pull/3586)
Check also https://github.com/p4lang/p4c/pull/3586#issuecomment-1284594818
Master branch? Which version of Clang?
Clang 12 should work (see quite related #3586)
Check also #3586 (comment)
Thanks, but as I mentioned in the beginning, I get the same error with gcc, though boehm-gc is built separately using clang. The port uses the latest release of p4c. Clang version 13.0.0 FTR.
Disabling GC does get around the issue, thanks. What are the consequences of running without garbage collection, if any?
Well.. for larger programs you may run out of memory because all heap allocation is managed by the garbage collector. This compiler does not use any smart pointers. I believe the crash is fixable, but the garbage collection library (https://github.com/ivmai/bdwgc) is a bit finicky.
Beyond that, getting a successful build was only a matter of two files missing sys/wait.h.
Thanks for doing this! Could you file a PR for the includes? I can approve/merge it then.
We could add a weekly/nightly FreeBSD CI workflow to ensure that the compiler does compile and can compile programs. However, we have limited development resources and I am not sure if there is enough incentive to continuously fix issues on FreeBSD. We barely maintain our MacOS/Fedora builds.
Sure, I can do a PR.
No need for CI workflows on my account. As I mentioned, I'm only satisfying my curiosity since a colleague is learning P4 and came to me with a question. ( She was having trouble with vagrant to install the tutorial VM on her Mac. ) I just decided to see what it would take to run it on bare metal on my FreeBSD server. Installing the VM works fine using vagrant and virtualbox 6.x on my FreeBSD machine, BTW.
I'll sleep on this issue for a while and maybe ping the FreeBSD developers. A LOT of them work in the networking industry, so I suspect someone will take an interest.
-DCMAKE_BUILD_TYPE=Debug works?
Rebuilding boehm-gc with -g to get more out of the backtrace:
(lldb) bt
* thread #1, name = 'irgenerator', stop reason = signal SIGSEGV
* frame #0: 0x0000000827753294 libthr.so.3`___lldb_unnamed_symbol549 + 4
frame #1: 0x000000082775fcbb libthr.so.3`___lldb_unnamed_symbol710 + 139
frame #2: 0x000000082822eabd libc.so.7`open + 205
frame #3: 0x0000000823e7924d libgc.so.1`GC_unix_mmap_get_mem(bytes=65536) at os_dep.c:2228:21
frame #4: 0x0000000823e79215 libgc.so.1`GC_unix_get_mem(bytes=65536) at os_dep.c:2277:12
frame #5: 0x0000000823e6ead5 libgc.so.1`GC_scratch_alloc(bytes=8224) at headers.c:138:25
frame #6: 0x0000000823e6ec32 libgc.so.1`GC_init_headers at headers.c:195:35
frame #7: 0x0000000823e76c4c libgc.so.1`GC_init at misc.c:1252:5
frame #8: 0x00000000002dc128 irgenerator`::realloc(ptr=0x0000000000000000, size=1664) at gc.cpp:142:13
frame #9: 0x00000000002dc367 irgenerator`::malloc(size=1664) at gc.cpp:161:28
frame #10: 0x00000000002dc3f6 irgenerator`::calloc(size=1664, elsize=1664) at gc.cpp:170:16
frame #11: 0x00000008277572d9 libthr.so.3`___lldb_unnamed_symbol584 + 409
frame #12: 0x000000082775639c libthr.so.3`___lldb_unnamed_symbol574 + 780
frame #13: 0x00003d28461bc0ad ld-elf.so.1
frame #14: 0x00003d28461ba6ab ld-elf.so.1
os_dep.c:2228 is this:
zero_fd = open("/dev/zero", O_RDONLY);
zero_fd is a global defined as
static int zero_fd = -1;
Very peculiar to get a seg fault on a function call with two constant arguments and an assignment to a simple int variable. This makes me think the core is being corrupted somehow before the program arrives here.
This is a stable boehm-gc port (8.2.2) that is used by dozens of other ports, so I doubt it's a boehm-gc bug.
New boehm GC versions are somehow problematic for p4c, @fruffy reported some issues recently. Maybe related? Could you downgrade it?
New boehm GC versions are somehow problematic for p4c, @fruffy reported some issues recently. Maybe related? Could you downgrade it?
It's likely an issue with the threading implementation in libgc. It needs to be appropriately configured for the distribution. We have problems with newer versions of z3, but libgc is stable.
I need to push https://github.com/p4lang/p4c/pull/3930 forward. It could help with some of these issues, but there are still some blockers.
Downgrading boehm-gc isn't really an option. What's in the FreeBSD ports tree will only go up from here, and I would not be the one to add and maintain a legacy port, given that I'm not even a P4 user. Legacy ports are frowned on in favor of fixing the root problem anyway.
This is not a high priority, as I'm just tinkering with the idea of creating a FreeBSD port out of sheer curiosity. A colleague is working with P4 so I decided to check out the build process. Reporting this in the hopes that it might help you improve the code. If you have any ideas what might be causing the seg fault, I'd be happy to test fixes. I tried building with gcc12 instead of clang, and tweaked optimization levels as a shot in the dark. No difference.
Tail of the build output.
Backtrace info: