Closed jasha64 closed 1 month ago
Hello!
I don't think we tested anything with IPv6, which seems to be the problem in the logs. Can you try IPv4 addresses?
Can you please also specify the following:
Hello, the problem remains when using IPv4:
Defaulted container "user-container" out of: user-container, queue-proxy
Picked up JAVA_TOOL_OPTIONS: -Djava.net.preferIPv4Stack=true
2024-09-26 13:32:13,257 [io.pixelsdb.pixels.worker.vhive.WorkerServer]-[INFO] rpc server run successfully
2024-09-26 13:35:17,870 [io.pixelsdb.pixels.worker.vhive.BasePartitionedJoinStreamWorker]-[DEBUG] register worker, local address: 192.168.137.190
2024-09-26 13:35:17,908 [io.pixelsdb.pixels.worker.vhive.BasePartitionedJoinStreamWorker]-[ERROR] error during join
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271) ~[pixels-worker-vhive.jar:?]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252) ~[pixels-worker-vhive.jar:?]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165) ~[pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.turbo.WorkerCoordinateServiceGrpc$WorkerCoordinateServiceBlockingStub.registerWorker(WorkerCoordinateServiceGrpc.java:473) ~[pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.planner.coordinate.WorkerCoordinateService.registerWorker(WorkerCoordinateService.java:72) ~[pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.worker.vhive.BasePartitionedJoinStreamWorker.process(BasePartitionedJoinStreamWorker.java:159) [pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.worker.vhive.PartitionedJoinStreamWorker.handleRequest(PartitionedJoinStreamWorker.java:39) [pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.worker.vhive.PartitionedJoinStreamWorker.handleRequest(PartitionedJoinStreamWorker.java:29) [pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.worker.vhive.utils.ServiceImpl.execute(ServiceImpl.java:72) [pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.worker.vhive.WorkerServiceImpl.process(WorkerServiceImpl.java:82) [pixels-worker-vhive.jar:?]
at io.pixelsdb.pixels.turbo.vHiveWorkerServiceGrpc$MethodHandlers.invoke(vHiveWorkerServiceGrpc.java:289) [pixels-worker-vhive.jar:?]
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) [pixels-worker-vhive.jar:?]
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:354) [pixels-worker-vhive.jar:?]
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) [pixels-worker-vhive.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) [pixels-worker-vhive.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) [pixels-worker-vhive.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: connect(..) failed: Address family not supported by protocol: /128.110.218.225:18894
Caused by: java.net.ConnectException: connect(..) failed: Address family not supported by protocol
at io.grpc.netty.shaded.io.netty.channel.unix.Errors.newConnectException0(Errors.java:155) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:128) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.unix.Socket.connect(Socket.java:313) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel.doConnect0(AbstractEpollChannel.java:773) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollSocketChannel.doConnect0(EpollSocketChannel.java:144) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel.doConnect(AbstractEpollChannel.java:758) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.connect(AbstractEpollChannel.java:600) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1342) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:548) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:533) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:548) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:538) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:391) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[pixels-worker-vhive.jar:?]
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[pixels-worker-vhive.jar:?]
... 1 more
I did not deploy the GRPC server inside a service or a container; it's a database backend running as a daemon process on the same node as the vHive master. (Therefore its placement is irrelevant to vHive.) The client is a query worker running as a serverless service, where the GRPC client side code resides. I called GRPC when I ran a SQL query inside the trino
CLI, which accesses the client's serverless service via the Knative URL http://pixels.default.192.168.1.240.sslip.io
; then the query worker will try to access the GRPC server using IP address 128.110.218.225
.
If you would like to reproduce, I installed vHive
and pixels
on the master node, deployed docker.io/jasha64/pixels-worker-vhive-stream:202409251834
as a serverless cloud function, set up minio
on the other node inside the Cloudlab cluster, and then ran queries via trino
; I can try to add you to my Cloudlab cluster or send you more deployment documents.
It turned out that this is a bug with GRPC. See https://github.com/pixelsdb/pixels/commit/9ed6776b1116822cb2c7abbf86ee65580601d2ce
Describe the question Attempt to connect to the vHive master node from within a serverless pod via Netty's GRPC. But strangely it keeps reporting "java.net.ConnectException: Address family not supported by protocol", even if I've specified
-Djava.net.preferIPv6Stack=true
or vice versa on both the server and the client sides. I wonder if this has anything to do with vHive's network layer.To Reproduce I ran the following on Cloudlab's Utah cluster
xl170
nodes.stock-only
mode)pixelsdb/pixels
)docker.io/jasha64/pixels-worker-vhive-stream:202409251842
)Logs Since I used
stock-only
, no logs fromvhive
,firecracker-containerd
is available. Serverless pod logs bykubectl logs pixels-00001-deployment-bc9b5bd6-xxxxx
:containerd
output: