google / differential-privacy

Google's differential privacy libraries.
Apache License 2.0
3.08k stars 353 forks source link

bazel build excute_query with "linkopts = ["-static"]", and the output binary does not work as expected #240

Closed zjingwang closed 1 year ago

zjingwang commented 1 year ago

Sorry to bother you, team, but i got this problem when i am trying to build excute_query.cc with static option.

The reason i am doing this is that my local machine does not support gcc-11 while the excute_query binary relies on gcc-11, so i am trying to get a executable binary relying not on gcc.

problems

my BUILD file

cc_binary(
    name = "execute_query",
    srcs = [
        "execute_query.cc",
    ],
    linkopts = ["-static"],
    deps = [
        "@com_google_absl//absl/flags:flag",
        "@com_google_absl//absl/flags:parse",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/status",
        "@com_google_absl//absl/status:statusor",
        "@com_google_absl//absl/strings",
        "@com_google_cc_differential_privacy//base:logging",
        "@com_google_cc_differential_privacy//base:status",
        "@com_google_protobuf//:protobuf",
        "@com_google_zetasql//zetasql/public:analyzer_options",
        "@com_google_zetasql//zetasql/public:catalog",
        "@com_google_zetasql//zetasql/public:language_options",
        "@com_google_zetasql//zetasql/public:options_cc_proto",
        "@com_google_zetasql//zetasql/public:simple_catalog",
        "@com_google_zetasql//zetasql/public:type_cc_proto",
        "@com_google_zetasql//zetasql/public:value",
        "@com_google_zetasql//zetasql/resolved_ast",
        "@com_google_zetasql//zetasql/resolved_ast:resolved_node_kind_cc_proto",
        "@com_google_zetasql//zetasql/tools/execute_query:execute_query_tool",
    ],
)

errors

root@0f80a05cbe23:/zetasql/bazel-bin# ./execute_query --data_set=/zetasql/data/day_data.csv --userid_c`Time entered\`), HOUR) AS \`Hour entered\`, COUNT(*) AS \`Total Visitors (Raw)\` FROM day_data GROUP 

Segmentation fault (core dumped)

environmet

ubuntu:20.04 docker

i am not familiar with c++, please tell me is this way possible or something i wa doing incorrectly, regards.

zjingwang commented 1 year ago

i followed the gdb debug steps, and got the following error, but i does not know what it means

(gdb) set args --data_set=/zetasql/data/day_data.csv --userid_col=VisitorId "SELECT TIME_TRUNC(PARSE_TIME('%I:%M%p', \`Time entered\`), HOUR) AS \`Hour entered\`, COUNT(*) AS \`Total Visitors (Raw)\` FROM day_data GROUP BY \`Hour entered\`"
(gdb) run
Starting program: zetasql/execute_query --data_set=/zetasql/data/day_data.csv --userid_col=VisitorId "SELECT TIME_TRUNC(PARSE_TIME('%I:%M%p', \`Time entered\`), HOUR) AS \`Hour entered\`, COUNT(*) AS \`Total Visitors (Raw)\` FROM day_data GROUP BY \`Hour entered\`"

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000001ac5fe3 in icu_65::umtx_initImplPostInit(icu_65::UInitOnce&) ()
#2  0x0000000001a952b8 in icu_65::umtx_initOnce(icu_65::UInitOnce&, void (*)(UErrorCode&), UErrorCode&) ()
#3  0x0000000001b0e293 in icu_65::Norm2AllModes::getNFCInstance(UErrorCode&) ()
#4  0x0000000001b0e2b8 in icu_65::Normalizer2::getNFCInstance(UErrorCode&) ()
#5  0x0000000001125bf4 in zetasql::functions::(anonymous namespace)::GetNormalizerByMode(zetasql::functions::NormalizeMode, absl::Status*) ()
#6  0x000000000112a8bf in zetasql::functions::Normalize(std::basic_string_view<char, std::char_traits<char> >, zetasql::functions::NormalizeMode, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, absl::Status*) ()
#7  0x000000000048b2c1 in zetasql::(anonymous namespace)::EstimateGlyphWidth(std::basic_string_view<char, std::char_traits<char> >) ()
#8  0x000000000048dc76 in zetasql::ToPrettyOutputStyle(zetasql::Value const&, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
#9  0x00000000004889a7 in zetasql::PrintResults(std::unique_ptr<zetasql::EvaluatorTableIterator, std::default_delete<zetasql::EvaluatorTableIterator> >, std::ostream&) ()
#10 0x0000000000488d61 in zetasql::ExecuteQueryStreamWriter::executed(zetasql::ResolvedNode const&, std::unique_ptr<zetasql::EvaluatorTableIterator, std::default_delete<zetasql::EvaluatorTableIterator> >) ()
#11 0x0000000000463260 in zetasql::ExecuteQuery(std::basic_string_view<char, std::char_traits<char> >, zetasql::ExecuteQueryConfig&, zetasql::ExecuteQueryWriter&) ()
#12 0x00000000004164d4 in main ()
(gdb) x/10i $pc
=> 0x0: Cannot access memory at address 0x0
dibakch commented 1 year ago

Thanks for reaching out!

Is the same thing also happening when using dynamic linking?

zjingwang commented 1 year ago

no, dynamic linking works well in the same machine

Thanks for reaching out!

Is the same thing also happening when using dynamic linking?

zjingwang commented 1 year ago

with dynamic linking

image

with static linking

image

dibakch commented 1 year ago

For the cc_binary target, there is a linkstatic option, which is True by default. This tries to statically link if possible. Have you tried using the binary created with the default bazel setup? For me, the binary only requires a couple of shared libs:

> ldd bazel-bin/execute_query
        linux-vdso.so.1 (0x00007fff68ab1000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2389b0a000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2389800000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2389aea000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f238961f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f238e0a4000)

You can check if those exist in your target system?

dibakch commented 1 year ago

I was able to produce a fully linked binary via

> git diff
diff --git a/examples/zetasql/BUILD b/examples/zetasql/BUILD
index 77e632a..ebb27b9 100644
--- a/examples/zetasql/BUILD
+++ b/examples/zetasql/BUILD
@@ -21,6 +21,8 @@ cc_binary(
     srcs = [
         "execute_query.cc",
     ],
+    features = ["fully_static_link"],
+    linkstatic = True,
     visibility = ["//visibility:public"],
     deps = [
         "@com_google_absl//absl/flags:flag",

This is fully linked

> ldd ./bazel-bin/execute_query
        not a dynamic executable

And also works on the example data

> ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
+-----+
|     |
+-----+
| 452 |
+-----+

So setting features = ["fully_static_link"] in the cc_binary should solve your problem. Let me know if it doesn't.

zjingwang commented 1 year ago

thank you for your help @dibakch , i followed your instructure and succeed getting a executable binary, but when i run the binary, i got no output at all, no errors, no logs

root@92e92b9367b4:/wzj/zetasql# ldd ./bazel-bin/execute_query
        not a dynamic executable

root@92e92b9367b4:/wzj/zetasql# ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'

compared with default setup

root@92e92b9367b4:/wzj/zetasql# ldd /zetasql/bazel-bin/execute_query
        linux-vdso.so.1 (0x00007ffd377a8000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7486636000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f74864e7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7486279000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7486254000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7486062000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f748ac03000)

root@92e92b9367b4:/wzj/zetasql# /zetasql/bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
+-----+
|     |
+-----+
| 451 |
+-----+

and here is my BUILD file

cc_binary(
    name = "execute_query",
    srcs = [
        "execute_query.cc",
    ],
    features = ["fully_static_link"],
    visibility = ["//visibility:public"],
    linkstatic=True,
    deps = [
        "@com_google_absl//absl/flags:flag",
        "@com_google_absl//absl/flags:parse",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/status",
        "@com_google_absl//absl/status:statusor",
        "@com_google_absl//absl/strings",
        "@com_google_cc_differential_privacy//base:logging",
        "@com_google_cc_differential_privacy//base:status",
        "@com_google_protobuf//:protobuf",
        "@com_google_zetasql//zetasql/public:analyzer_options",
        "@com_google_zetasql//zetasql/public:catalog",
        "@com_google_zetasql//zetasql/public:language_options",
        "@com_google_zetasql//zetasql/public:options_cc_proto",
        "@com_google_zetasql//zetasql/public:simple_catalog",
        "@com_google_zetasql//zetasql/public:type_cc_proto",
        "@com_google_zetasql//zetasql/public:value",
        "@com_google_zetasql//zetasql/resolved_ast",
        "@com_google_zetasql//zetasql/resolved_ast:resolved_node_kind_cc_proto",
        "@com_google_zetasql//zetasql/tools/execute_query:execute_query_tool",
    ],
)
zjingwang commented 1 year ago

re-build the execute_query.cc and i still got the error below:

INFO: Elapsed time: 1336.477s, Critical Path: 1140.13s
INFO: 1648 processes: 186 internal, 1462 processwrapper-sandbox.
INFO: Build completed successfully, 1648 total actions
root@92e92b9367b4:/wzj/zetasql# ldd /wzj/zetasql/bazel-bin/execute_query
        not a dynamic executable
root@92e92b9367b4:/wzj/zetasql# /wzj/zetasql/bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
Aborted (core dumped)

looking forward to your apply, wish a good day

dibakch commented 1 year ago

It might be that your container ran out of memory.

Could you run the following in the ./examples/zetasql folder and paste the last few lines:

strace ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
zjingwang commented 1 year ago

It might be that your container ran out of memory.

Could you run the following in the ./examples/zetasql folder and paste the last few lines:

strace ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'

thanks for your reply @dibakch , i run the

strace ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'

and got the following output

root@92e92b9367b4:/wzj/zetasql# strace ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
execve("./bazel-bin/execute_query", ["./bazel-bin/execute_query", "--data_set=data/day_data.csv", "--userid_col=VisitorId", "SELECT WITH ANONYMIZATION OPTION"...], 0x7ffd4fcf8ba8 /* 14 vars */) = 0
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffe02da81c0) = -1 EINVAL (Invalid argument)
brk(NULL)                               = 0x597b000
brk(0x597c340)                          = 0x597c340
arch_prctl(ARCH_SET_FS, 0x597ba00)      = 0
uname({sysname="Linux", nodename="92e92b9367b4", ...}) = 0
readlink("/proc/self/exe", "/root/.cache/bazel/_bazel_root/7"..., 4096) = 129
brk(0x599d340)                          = 0x599d340
brk(0x599e000)                          = 0x599e000
mprotect(0x48f2000, 516096, PROT_READ)  = 0
brk(0x59bf000)                          = 0x59bf000
brk(0x59e0000)                          = 0x59e0000
brk(0x5a01000)                          = 0x5a01000
futex(0x498e79c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x498e7a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f48e9aed000
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f48e9add000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f48e9acd000
futex(0x4984108, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/usr/share/zoneinfo/America/Los_Angeles", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2852, ...}) = 0
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0\0\0"..., 4096) = 2852
lseek(3, -1810, SEEK_CUR)               = 1042
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0\0\0"..., 4096) = 1810
brk(0x5a2c000)                          = 0x5a2c000
close(3)                                = 0
futex(0x49867d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "data/day_data.csv", O_RDONLY) = 3
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "VisitorId,Time entered,Time spen"..., 65536) = 7722
read(3, "", 57814)                      = 0
close(3)                                = 0
brk(0x5a65000)                          = 0x5a65000
mmap(NULL, 217088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f48e9a98000
brk(0x5a35000)                          = 0x5a35000
futex(0x4986198, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4986218, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x5a56000)                          = 0x5a56000
futex(0x49862a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x5a77000)                          = 0x5a77000
futex(0x49862c0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x5a98000)                          = 0x5a98000
futex(0x4986970, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4986180, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4981a20, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4986510, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x5ab9000)                          = 0x5ab9000
brk(0x5ada000)                          = 0x5ada000
futex(0x4986290, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x5afb000)                          = 0x5afb000
futex(0x4983280, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x5ab8950, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4983f08, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=48221340}) = 0
openat(AT_FDCWD, "/proc/self/maps", O_RDONLY|O_CLOEXEC) = 3
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "00400000-048f1000 r-xp 00000000 "..., 1024) = 1024
close(3)                                = 0
sched_getaffinity(0, 32, [0, 1, 2, 3, 4, 5]) = 8
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=48854637}) = 0
brk(0x5b1d000)                          = 0x5b1d000
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=49222455}) = 0
futex(0x4984170, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=49733406}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=49877574}) = 0
futex(0x4984048, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=52913397}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=52959234}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=53445739}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=53473101}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=53561851}) = 0
futex(0x49841c0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=53990192}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=54282414}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=55789048}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=55871742}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=55887172}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=56325753}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=56379341}) = 0
clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=56480454}) = 0
brk(0x5b3e000)                          = 0x5b3e000
brk(0x5b5f000)                          = 0x5b5f000
brk(0x5b80000)                          = 0x5b80000
brk(0x5ba1000)                          = 0x5ba1000
brk(0x5bc2000)                          = 0x5bc2000
getrandom("\x3c\x63\xff\x1a\x16\x87\x51\xe4\xcc\x24\x30\xbb\xcc\x29\xcc\x3a\x0a\xb8\x72\xf0\xd1\xdc\x0c\xf8\xd3\xff\x12\x7c\x16\xfe\x51\xcc"..., 256, 0) = 256
getrandom("\x14\xac\x6d\x31\x1f\x2c\x12\xaa\x9e\x29\x07\x63\xd8\xec\x78\x66\x04\xa9\x9f\x8d\xca\xb0\xc4\x61\xd7\x25\xae\x3c\xb1\x6a\x85\xf8"..., 256, 0) = 256
getrandom("\x5f\xb8\xd4\xb0\x8c\x3a\xb8\x66\x20\x0e\x8f\xf6\x26\xd7\xd8\x08\x1c\xa9\x07\x4a\x8c\x54\xf2\xd9\x42\x06\x11\x04\xf1\x04\xd1\x59"..., 256, 0) = 256
getrandom("\x91\xad\x34\x0f\xc6\x41\xb6\x42\x1d\xc7\xd8\x16\xe3\x73\x52\x67\x04\x25\x3e\x42\xaf\xfc\x94\x01\x41\x0b\x7c\xef\xee\x39\xd3\x9b"..., 256, 0) = 256
getrandom("\x84\xe9\x0e\x83\x18\xd2\x2d\x6a\x22\xf0\x4a\x67\x4b\xfc\xef\x9f\xfe\x38\xc2\x5f\xcc\x2f\xbc\x72\x82\xb3\x95\xbd\xbb\xab\xe5\xc0"..., 256, 0) = 256
getrandom("\xea\x24\x0f\x96\xfc\xb9\xc5\x2d\x74\xea\x9b\x89\x05\xdf\xe7\xf5\x9e\xbf\xcc\xd8\x2d\x70\xe1\x4b\x39\x93\xcb\xca\x26\xdb\x55\x57"..., 256, 0) = 256
getrandom("\xe3\xbc\x31\xbc\xfe\xe6\x5e\x83\x2f\xa7\xa9\xd9\x81\x57\x47\x39\x1d\x58\x3b\x17\xa8\xa4\x87\x64\xdc\x69\xab\x9b\x25\xf4\xd1\xa0"..., 256, 0) = 256
getrandom("\xed\x5a\x72\x9f\x7e\xb4\x91\xf9\xfe\x18\x04\xa7\x13\x54\x5c\x59\xeb\x25\x57\x68\x00\xec\x3e\xff\xb2\x6d\xcc\x04\x21\x26\x22\x7b"..., 256, 0) = 256
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f48e9a97000
madvise(0x7f48e9a97000, 4096, 0xffffffff /* MADV_??? */) = -1 EINVAL (Invalid argument)
madvise(0x7f48e9a97000, 4096, MADV_WIPEONFORK) = 0
futex(0x4980cdc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid()                                = 46373
gettid()                                = 46373
tgkill(46373, 46373, SIGABRT)           = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=46373, si_uid=0} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)

or maybe could you upload your execute_query for test?

dibakch commented 1 year ago

Unfortunately this is not what I was hoping for and looks almost identical to the successful run on my machine. Let's try building with debugging symbols and see if the output changes. To do this:

  1. We have to recompile using bazelisk build -c dbg :execute_query so that we have debugging symbols available.
  2. Run ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data' and see if there is more information available from the output.

If there is not more information available, we need a backtrace to figure out what is going on. We'd need to run

  1. (In case gdb is not installed on the Ubuntu container) run sudo apt-get install -y gdb
  2. Run gdb --args ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
  3. In the gdb prompt, enter run and then after gdb reported that it crashed enter backtrace.

Thanks for your patience.

zjingwang commented 1 year ago

sorry for taking so long to reply cause it took really long time to recompile and thanks for all your consistent help and your great patiance .

here is my debug log

*`./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT() FROM day_data'`**

INFO: Elapsed time: 2506.854s, Critical Path: 2286.39s
INFO: 897 processes: 20 internal, 877 processwrapper-sandbox.
INFO: Build completed successfully, 897 total actions
root@92e92b9367b4:/wzj/zetasql# 
root@92e92b9367b4:/wzj/zetasql# 
root@92e92b9367b4:/wzj/zetasql# ldd ./bazel-bin/execute_query
        not a dynamic executable
root@92e92b9367b4:/wzj/zetasql# ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
Aborted (core dumped)

*`gdb --args ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT() FROM day_data'`**

Starting program: /root/.cache/bazel/_bazel_root/72c87886824dc050eaa7cc5816887946/execroot/zetasql_example/bazel-out/k8-dbg/bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId SELECT\ WITH\ ANONYMIZATION\ OPTIONS\(epsilon=1,\ delta=1e-10,\ kappa=1\)\ ANON_COUNT\(\*\)\ FROM\ day_data
warning: Error disabling address space randomization: Operation not permitted

Program received signal SIGABRT, Aborted.
0x00000000020070eb in gsignal ()
(gdb) backtrace
#0  0x00000000020070eb in gsignal ()
#1  0x00000000004145c1 in abort ()
#2  0x0000000000a3771b in CRYPTO_STATIC_MUTEX_lock_read (lock=0x4980ce0 <g_fork_detect_lock>) at external/boringssl/src/crypto/thread_pthread.c:70
#3  0x0000000000a2199d in CRYPTO_get_fork_generation () at external/boringssl/src/crypto/fipsmodule/rand/fork_detect.c:104
#4  0x0000000000a21bcd in RAND_bytes_with_additional_data (out=0x68d08b0 "\240\231\203\006", out_len=65536, user_additional_data=0x223f1a0 <kZeroAdditionalData.41> "")
    at external/boringssl/src/crypto/fipsmodule/rand/rand.c:308
#5  0x0000000000a22018 in RAND_bytes (out=0x68d08b0 "\240\231\203\006", out_len=65536) at external/boringssl/src/crypto/fipsmodule/rand/rand.c:451
#6  0x00000000009d0308 in differential_privacy::SecureURBG::RefreshBuffer (this=0x68949f0) at external/com_google_cc_differential_privacy/algorithms/rand.cc:96
#7  0x00000000009d0260 in differential_privacy::SecureURBG::operator() (this=0x68949f0) at external/com_google_cc_differential_privacy/algorithms/rand.cc:86
#8  0x00000000009d00fd in differential_privacy::UniformDouble () at external/com_google_cc_differential_privacy/algorithms/rand.cc:44
#9  0x00000000009c6361 in differential_privacy::internal::GeometricDistribution::GetUniformDouble (this=0x685c7c0)
    at external/com_google_cc_differential_privacy/algorithms/distributions.cc:169
#10 0x00000000009c641d in differential_privacy::internal::GeometricDistribution::Sample (this=0x685c7c0, scale=1)
    at external/com_google_cc_differential_privacy/algorithms/distributions.cc:179
#11 0x00000000009c63a1 in differential_privacy::internal::GeometricDistribution::Sample (this=0x685c7c0) at external/com_google_cc_differential_privacy/algorithms/distributions.cc:171
#12 0x00000000009c729d in differential_privacy::internal::LaplaceDistribution::Sample (this=0x68ac550) at external/com_google_cc_differential_privacy/algorithms/distributions.cc:305
#13 0x00000000009c06d7 in differential_privacy::LaplaceMechanism::AddInt64Noise (this=0x68ac4b0, result=450)
    at external/com_google_cc_differential_privacy/algorithms/numerical-mechanisms.cc:178
#14 0x000000000080596a in differential_privacy::NumericalMechanism::AddNoise<long, (void*)0> (this=0x68ac4b0, result=450)
    at external/com_google_cc_differential_privacy/algorithms/numerical-mechanisms.h:60
#15 0x0000000000807614 in differential_privacy::ApproxBounds<long>::AddNoise (this=0x67eb190, bins=...) at external/com_google_cc_differential_privacy/algorithms/approx-bounds.h:503
#16 0x0000000000801ebd in differential_privacy::ApproxBounds<long>::GenerateResult (this=0x67eb190, noise_interval_level=0.94999999999999996)
    at external/com_google_cc_differential_privacy/algorithms/approx-bounds.h:336
#17 0x00000000007c9afc in differential_privacy::Algorithm<long>::PartialResult (this=0x67eb190, noise_interval_level=0.94999999999999996)
    at external/com_google_cc_differential_privacy/algorithms/algorithm.h:113
#18 0x00000000008008b7 in differential_privacy::BoundedSumWithApproxBounds<long>::GenerateResult (this=0x6835e90, noise_interval_level=0.94999999999999996)
    at external/com_google_cc_differential_privacy/algorithms/bounded-sum.h:339
#19 0x00000000007c9afc in differential_privacy::Algorithm<long>::PartialResult (this=0x6835e90, noise_interval_level=0.94999999999999996)
    at external/com_google_cc_differential_privacy/algorithms/algorithm.h:113
#20 0x00000000007ba8fd in differential_privacy::Algorithm<long>::PartialResult (this=0x6835e90) at external/com_google_cc_differential_privacy/algorithms/algorithm.h:96
#21 0x0000000000773d4f in zetasql::(anonymous namespace)::GetAnonReturnValue<long> (algorithm=...) at external/com_google_zetasql/zetasql/reference_impl/function.cc:6420
#22 0x0000000000740cfe in zetasql::(anonymous namespace)::BuiltinAggregateAccumulator::GetFinalResultInternal (this=0x6875330, inputs_in_defined_order=false)
    at external/com_google_zetasql/zetasql/reference_impl/function.cc:6833
#23 0x000000000073a233 in zetasql::(anonymous namespace)::BuiltinAggregateAccumulator::GetFinalResult (this=0x6875330, inputs_in_defined_order=false)
    at external/com_google_zetasql/zetasql/reference_impl/function.cc:5939
#24 0x00000000005ffc93 in zetasql::(anonymous namespace)::AggregateAccumulatorAdaptor::GetFinalResult (this=0x6872820, inputs_in_defined_order=false)
    at external/com_google_zetasql/zetasql/reference_impl/aggregate_op.cc:186
#25 0x0000000000601770 in zetasql::(anonymous namespace)::IgnoresNullAccumulator::GetFinalResult (this=0x6837410, inputs_in_defined_order=false)
    at external/com_google_zetasql/zetasql/reference_impl/aggregate_op.cc:540
#26 0x0000000000603874 in zetasql::(anonymous namespace)::IntermediateAggregateAccumulatorAdaptor::GetFinalResult (this=0x68756f0, inputs_in_defined_order=false)
    at external/com_google_zetasql/zetasql/reference_impl/aggregate_op.cc:901
#27 0x0000000000609339 in zetasql::AggregateOp::CreateIterator (this=0x66fb4a0, params=..., num_extra_slots=0, context=0x68212d0)
    at external/com_google_zetasql/zetasql/reference_impl/aggregate_op.cc:1521
#28 0x00000000006b1386 in zetasql::FilterOp::CreateIterator (this=0x66cca90, params=..., num_extra_slots=0, context=0x68212d0)
    at external/com_google_zetasql/zetasql/reference_impl/relational_op.cc:1512
#29 0x00000000006c9aa5 in zetasql::RootOp::CreateIterator (this=0x66ea6f0, params=..., num_extra_slots=0, context=0x68212d0)
    at external/com_google_zetasql/zetasql/reference_impl/relational_op.cc:4618
#30 0x00000000006a41b4 in operator() (__closure=0x681b3d0) at external/com_google_zetasql/zetasql/reference_impl/relational_op.cc:101
#31 0x00000000006d3dce in std::__invoke_impl<absl::StatusOr<std::unique_ptr<zetasql::TupleIterator> >, zetasql::RelationalOp::Eval(absl::Span<const zetasql::TupleData* const>, int, zetasql::EvaluationContext*) const::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/11/bits/invoke.h:61                                                      
#32 0x00000000006d0fc0 in std::__invoke_r<absl::StatusOr<std::unique_ptr<zetasql::TupleIterator> >, zetasql::RelationalOp::Eval(absl::Span<const zetasql::TupleData* const>, int, zetasql--Type <RET> for more, q to quit, c to continue without paging--RET
::EvaluationContext*) const::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:116
#33 0x00000000006cda50 in std::_Function_handler<absl::StatusOr<std::unique_ptr<zetasql::TupleIterator, std::default_delete<zetasql::TupleIterator> > >(), zetasql::RelationalOp::Eval(absl::Span<const zetasql::TupleData* const>, int, zetasql::EvaluationContext*) const::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)                                      
    at /usr/include/c++/11/bits/std_function.h:291
#34 0x00000000006ddf43 in std::function<absl::StatusOr<std::unique_ptr<zetasql::TupleIterator, std::default_delete<zetasql::TupleIterator> > > ()>::operator()() const (this=0x6808ec8)
    at /usr/include/c++/11/bits/std_function.h:590
#35 0x00000000006dc25b in zetasql::PassThroughTupleIterator::Next (this=0x6808ec0) at external/com_google_zetasql/zetasql/reference_impl/tuple.h:1134
#36 0x00000000004c6447 in zetasql::internal::(anonymous namespace)::TupleIteratorAdaptor::NextRow (this=0x66f0bb0) at external/com_google_zetasql/zetasql/public/evaluator_base.cc:926
#37 0x0000000000488593 in zetasql::PrintResults (iter=..., out=...) at external/com_google_zetasql/zetasql/tools/execute_query/execute_query_writer.cc:56
#38 0x0000000000488d61 in zetasql::ExecuteQueryStreamWriter::executed (this=0x67d8730, ast=..., iter=...)
    at external/com_google_zetasql/zetasql/tools/execute_query/execute_query_writer.cc:104
#39 0x0000000000463260 in zetasql::ExecuteQuery (sql=..., config=..., writer=...) at external/com_google_zetasql/zetasql/tools/execute_query/execute_query_tool.cc:530
#40 0x00000000004164d4 in main (argc=4, argv=0x7ffc67197f08) at execute_query.cc:236
dibakch commented 1 year ago

Interesting! This seems to be a bug in our DP Lib. From the documentation of RAND_bytes:

// RAND_bytes writes |len| bytes of random data to |buf| and returns one. In the // event that sufficient random data can not be obtained, |abort| is called.

This should not happen in a library and we should probably fetch the abort using sigaction and retry a couple of times. Alternatively, we can check /proc/sys/kernel/random/entropy_avail, but this would only work on Linux.

I'll try and find some time this week to fix this.

Can you check the available entropy before and after running execute_query? This could be done by executing

cat /proc/sys/kernel/random/entropy_avail \
  && ./bazel-bin/execute_query \
    --data_set=data/day_data.csv \
    --userid_col=VisitorId \
    'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data' \
  && cat /proc/sys/kernel/random/entropy_avail
zjingwang commented 1 year ago

Interesting! This seems to be a bug in our DP Lib. From the documentation of RAND_bytes:

// RAND_bytes writes |len| bytes of random data to |buf| and returns one. In the // event that sufficient random data can not be obtained, |abort| is called.

This should not happen in a library and we should probably fetch the abort using sigaction and retry a couple of times. Alternatively, we can check /proc/sys/kernel/random/entropy_avail, but this would only work on Linux.

I'll try and find some time this week to fix this.

Can you check the available entropy before and after running execute_query? This could be done by executing

cat /proc/sys/kernel/random/entropy_avail \
  && ./bazel-bin/execute_query \
    --data_set=data/day_data.csv \
    --userid_col=VisitorId \
    'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data' \
  && cat /proc/sys/kernel/random/entropy_avail

Here is the result

root@92e92b9367b4:/wzj/zetasql# cat /proc/sys/kernel/random/entropy_avail
3760
root@92e92b9367b4:/wzj/zetasql# ./bazel-bin/execute_query --data_set=data/day_data.csv --userid_col=VisitorId 'SELECT WITH ANONYMIZATION OPTIONS(epsilon=1, delta=1e-10, kappa=1) ANON_COUNT(*) FROM day_data'
Aborted (core dumped)
root@92e92b9367b4:/wzj/zetasql# cat /proc/sys/kernel/random/entropy_avail
3764
dibakch commented 1 year ago

There seems to be plenty of entropy, though. So it's unlikely that it is this documented behavior where the code would call abort.

The current version of boringssl follows a different code path. I'll take a look at the exact code version that we were using there. Maybe updating the dependencies will already help with this issue.

zjingwang commented 1 year ago

Hi, @dibakch, i was wondering do we have a tool for this: transforming normal sql into zetasql with a differential_privacy_clause. if so, please tell me. thank you for all the help. hava a good day!

dibakch commented 1 year ago

I'm not aware of any public available tool that does this automatically right now

zjingwang commented 1 year ago

Hi, @dibakch , is it possible for zetasql to connect to a database such as postgres, and query from that database for differential privacy results? if so, please tell me. Much appreciation !

dibakch commented 1 year ago

We have an experimental postgres extension that provides a slightly different syntax and other privacy guarantees. This extension is in the cc/postgres/ folder of this repo.

In case you need something that is more production ready, you can use BigQuery, which provides the same syntax, but is scalable and very convenient to use.

zjingwang commented 1 year ago

We have an experimental postgres extension that provides a slightly different syntax and other privacy guarantees. This extension is in the cc/postgres/ folder of this repo.

In case you need something that is more production ready, you can use BigQuery, which provides the same syntax, but is scalable and very convenient to use.

thank you sir, i'll have a try!

dibakch commented 1 year ago

Did you find a solution for your problem?

As for your original problem to statically link the ZetaSQL binary: I'm currently updating the BoringSSL dependency, so maybe this might help here already. Note that there is a circular dependency and that the ZetaSQL binary still depends on the DP lib as defined in the ZetaSQL repo. So it might take a bit until this can be tested (we first have to release the DP Lib, then need to pick up the new version in ZetaSQL, and finally update the ZetaSQL dependency here).

dibakch commented 1 year ago

Closing this for now. Feel free to reopen if this is still relevant.