Closed ignatenkobrain closed 1 year ago
I see that it broke first time (on 3.4 version) after these dependencies update: https://koschei.fedoraproject.org/build/6588175
@grobian any idea?
Yeah, I think I once meant to probe for a free port to use, but this is likely a TODO. So it uses a port that's probably in use by something else on your env.
You could try simply replacing 3020 with something different in
test/run-test.sh: local start_server_lastport=3020 # TODO
As you can see in the log, I explicitly run ss
before to check which ports are in use, and there is no 3020 one.
working theory at the moment is that for some reason the port increment isn't working, so the second server start uses the same port and thus fails
update: scratch that, it reports relay 1 to fail.
Interistingly, it works fine if I run rpmbuild on my laptop, but not in mock (something which is setting up chroot and executes rpmbuild inside).
I'm wondering, is there anything specific towards the network stack in this env? e.g. disabled ipv6 or something?
There is no networking set up. Though, I did not see before that you should not be able to bind to a port.
Does that mean that any bind() is supposed to fail in the buildenv? The tests try to ensure data is being transferred across unix and TCP sockets.
No, you should be able to bind() there without any problems. For example, running python3 -m http.server
works just fine in that environment. Also what is interesting, in very same environment tests pass just fine on CentOS7. And if you see link above, it also worked in Fedora until some dependencies update.
Hmmmm, I'm a bit at loss here. Since I just use bind(), it seems to suggest this is glibc related (as only package that I can imagine influencing this, except for the kernel itself). I'm already using SO_REUSEADDR, fwiw.
I see that glibc was updated... 2.29.9000-27.fc31 → 2.29.9000-29.fc31
Curious if that could be related.
@fweimer any ideas here?
If I read it correctly, all list of changes are:
❯ git log --oneline 51ea67d54882318c4fa5394c386f4816ddc22408..21cc130b78a4db9113fb6695e2b951e697662440
21cc130b78 libio: do not attempt to free wide buffers of legacy streams [BZ #24228]
49bc41b642 [powerpc] add 'volatile' to asm
335c1007bf powerpc: Fix static-linked version of __ppc_get_timebase_freq [BZ #24640]
f59a54ab0c nl_AW locale: Correct the negative monetary format (bug 24614).
f0c5a803bd Fix gcc 9 build errors for make xcheck. [BZ #24556]
fabf5e49dd dlfcn: Avoid one-element flexible array in Dl_serinfo [BZ #24166]
2c75b545de elf: Refuse to dlopen PIE objects [BZ #24323]
02d8b5ab1c nl_NL locale: Correct the negative monetary format (bug 24614).
112a0ae18b m68k: Remove vDSO support
dee07df1a4 powerpc: Refactor powerpc64 lround/lroundf/llround/llroundf
2166283fcc powerpc: Refactor powerpc32 lrint/lrintf/llrint/llrintf
78049de0a9 powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintf
48c3c12389 Linux: Fix __glibc_has_include use for <sys/stat.h> and statx
8d141877e0 <sys/cdefs.h>: Inhibit macro expansion for __glibc_has_include
cf27468602 Add IPV6_ROUTER_ALERT_ISOLATE from Linux 5.1 to bits/in.h.
a26e2e9fea Allow memset local PLT reference for powerpc soft-float.
82bc69c012 aarch64: handle STO_AARCH64_VARIANT_PCS
55f82d328d aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS
1192696069 powerpc: Remove optimized finite
a72186761b math: Use wordsize-64 version for finite
6427a6ac8c powerpc: Remove optimized isinf
a8c590f789 math: Use wordsize-64 version for isinf
2666f96390 powerpc: Remove optimized isnan
197dbda1a1 math: Use wordsize-64 version for isnan
2731a326b1 benchtests: Add isnan/isinf/isfinite benchmark
e41d66e41a powerpc: copysign cleanup
21bd039bb4 powerpc: consolidate rint
cfa611447b libio: freopen of default streams crashes in old programs [BZ #24632]
744e829637 Linux: Deprecate <sys/sysctl.h> and sysctl
5dad6ffbb2 <sys/stat.h>: Use Linux UAPI header for statx if available and useful
4e75c2a43b <sys/cdefs.h>: Add __glibc_has_include macro
680942b016 Improve performance of memmem
5e0a7ecb66 Improve performance of strstr
80b2bfb535 Benchmark strstr hard needles
e6e2424390 Fix malloc tests build with GCC 10.
If I read it correctly, all list of changes are:
❯ git log --oneline 51ea67d54882318c4fa5394c386f4816ddc22408..21cc130b78a4db9113fb6695e2b951e697662440
21cc130b78 libio: do not attempt to free wide buffers of legacy streams [BZ #24228]
49bc41b642 [powerpc] add 'volatile' to asm
335c1007bf powerpc: Fix static-linked version of __ppc_get_timebase_freq [BZ #24640]
f59a54ab0c nl_AW locale: Correct the negative monetary format (bug 24614).
f0c5a803bd Fix gcc 9 build errors for make xcheck. [BZ #24556]
fabf5e49dd dlfcn: Avoid one-element flexible array in Dl_serinfo [BZ #24166]
2c75b545de elf: Refuse to dlopen PIE objects [BZ #24323]
02d8b5ab1c nl_NL locale: Correct the negative monetary format (bug 24614).
112a0ae18b m68k: Remove vDSO support
dee07df1a4 powerpc: Refactor powerpc64 lround/lroundf/llround/llroundf
2166283fcc powerpc: Refactor powerpc32 lrint/lrintf/llrint/llrintf
78049de0a9 powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintf
48c3c12389 Linux: Fix __glibc_has_include use for <sys/stat.h> and statx
8d141877e0 <sys/cdefs.h>: Inhibit macro expansion for __glibc_has_include
cf27468602 Add IPV6_ROUTER_ALERT_ISOLATE from Linux 5.1 to bits/in.h.
a26e2e9fea Allow memset local PLT reference for powerpc soft-float.
82bc69c012 aarch64: handle STO_AARCH64_VARIANT_PCS
55f82d328d aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS
1192696069 powerpc: Remove optimized finite
a72186761b math: Use wordsize-64 version for finite
6427a6ac8c powerpc: Remove optimized isinf
a8c590f789 math: Use wordsize-64 version for isinf
2666f96390 powerpc: Remove optimized isnan
197dbda1a1 math: Use wordsize-64 version for isnan
2731a326b1 benchtests: Add isnan/isinf/isfinite benchmark
e41d66e41a powerpc: copysign cleanup
21bd039bb4 powerpc: consolidate rint
cfa611447b libio: freopen of default streams crashes in old programs [BZ #24632]
744e829637 Linux: Deprecate <sys/sysctl.h> and sysctl
5dad6ffbb2 <sys/stat.h>: Use Linux UAPI header for statx if available and useful
4e75c2a43b <sys/cdefs.h>: Add __glibc_has_include macro
680942b016 Improve performance of memmem
5e0a7ecb66 Improve performance of strstr
80b2bfb535 Benchmark strstr hard needles
e6e2424390 Fix malloc tests build with GCC 10.
@grobian while looking into a strace output, it definitely was able to bind to 3020 port.
[pid 1559508] bind(1, {sa_family=AF_INET, sin_port=htons(3020), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
But I don't see where it binds to it again....
Sorry, I don't know what is going on.
[pid 1559539] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 1
[pid 1559539] setsockopt(1, SOL_SOCKET, SO_RCVTIMEO_OLD, "\0\0\0\0\0\0\0\0 \241\7\0\0\0\0\0", 16) = 0
[pid 1559539] setsockopt(1, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 1559539] bind(1, {sa_family=AF_INET, sin_port=htons(3020), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRINUSE (Address already in use)
I think what could happen is that tests execute too fast after each other or something and for some reason the connection is still lingering around. Oddly enough this shouldn't be an issue, when the connection is closed properly. It could be some tuning setting that explains why on some envs this works, and others it doesn't. Purely hypothetical. I'm thinking the best attack may be to randomise the starting port (in absence of a method to test a port being free), such that this timing issue (theory above) doesn't happen.
I'm seeing the same issue in Debian, on our buildds and also in the CI. So I've added some code to use a random port, but this didn't help. It doesn't fail in docker...
--- basic.payloadout
+++ basic.payloadout
@@ -7,5 +7,4 @@
bar 1 3
bar.foo 1 4
rewrite.foo 1 4
-aggregate.foo.bar 1.000000 349830000
relay 1: aggregator: dropping incorrect metric: foo.bar 1
That starts to look like a timing issue to me. Basically the aggregate output is missing.
yes, but the amount of time it sleeps ... it should produce the aggregate on a window end
You could try and increase the sleep, I wonder if that helps
I know this is old but I ran into the same issue trying to build 3.7.4 on Rocky 8.6 and I can confirm that increasing the sleep in the test script from 2 to 4 seconds does indeed allow the aggregation test to pass.
Ok, let's just up that sleep then.
Hello,
I am building latest version of carbon-c-relay in Fedora, but some tests are failing.
I have added some netstat + strace, but could not find anything suspicious: https://kojipkgs.fedoraproject.org//work/tasks/309/40920309/build.log