Closed PredatorVI closed 2 years ago
For some reason the untar command on the other side isn't completing. It's hard to know why. I don't think that this is a WebSockets issue, I think it is an issue with the process in the container that is unpacking the copied files (or alternately it is an issue with the input data not getting closed properly)
How are you supplying the data that is getting copied? Byte array? Local file path? Is it one file or many?
I will try to repro locally.
I've started testing using a simple text file.
The test I'm doing is to first call copyFileFromPod() to grab a config file from the pod (POD:/etc/adduser.conf --> LOCAL:C:/tmp/adduser.conf). I then turn around and copy that same file using copyFileToPod() (LOCAL:C:/tmp/adduser.conf --> POD:/tmp/adduser.conf).
I am using the java.nio.file.Path variant of the method:
public void copyFileToPod(String namespace, String pod, String container, Path srcPath, Path destPath) throws ApiException, IOException
I open to suggestions on how to narrow this down more or debug it.
Thanks!!
I noticed running 'ps -ef | grep tar' in the container that the path was not right. I am setting the destPath (remote ubuntu container) to Paths.get("/tmp/adduser.conf"), but it seems to be using the Windows path delimiter.
root 1344130 0 0 16:33 ? 00:00:00 sh -c base64 -d | tar -xmf - -C \tmp root 1344136 1344130 0 16:33 ? 00:00:00 tar -xmf - -C tmp
I pulled the source and added a parentPath.replace("\", "/"); just to see if that fixed it. However, it still seems to hang, but the paths look better:
root 1214170 0 0 16:33 ? 00:00:00 sh -c base64 -d | tar -xmf - -C /tmp root 12144136 1214170 0 16:33 ? 00:00:00 tar -xmf - -C /tmp
There appear to be no variants of copyFileToPod() that take a string for the remote destPath like the copyFileFromPod had for the remote srcPath so there seems to be potential issue using native Path/File references where the source and destination systems use different path delimiters.
I also noticed now that once I kill the client process, it seems to finish and the file does show up. I didn't check the content but will once I get the pod back online.
File content appears to be written correctly when I stop/kill the client-side process. This is where my unfamiliarity with WebSockets isn't helping. I don't know how the EOF/end of input for the remote process (sh -c base64 -d | tar -xmf - -C /tmp) is triggered.
I tried grasping at straws:
Thanks for investigating! Basically the web socket should close when the process on the client side ends. In this case that process will only end when the stdin
that it is reading closes (at least that's what I think should happen)
If you're willing/able to investigate further, one thing you might try is removing the base64 encoding. I don't actually think that it is necessary, and it's possible that it's causing the problem. Then instead of sh -c ...
you could just run tar -xmf - -C /tmp
It's possible that spawning the extra shell (via sh -c ...
) is what is causing stdin
to hang open.
Still the same behavior. Here is the process output showing it isn't calling 'sh -c'. Hopefully the format is correct.
root@test-gcp-dev-gke-guse4a-0:/tmp# ps -ef | grep tar root 141926 0 0 16:05 ? 00:00:00 tar -xmf - -C /tmp
Here is the updated code:
public Future<Integer> copyFileToPodAsync(
String namespace, String pod, String container, Path srcPath, Path destPath)
throws ApiException, IOException {
// Run decoding and extracting processes
final Process proc = execCopyToPod(namespace, pod, container, destPath);
// Send encoded archive output stream
File srcFile = new File(srcPath.toUri());
try (
ArchiveOutputStream archiveOutputStream = new TarArchiveOutputStream(proc.getOutputStream());
FileInputStream input = new FileInputStream(srcFile)) {
ArchiveEntry tarEntry = new TarArchiveEntry(srcFile, destPath.getFileName().toString());
archiveOutputStream.putArchiveEntry(tarEntry);
Streams.copy(input, archiveOutputStream);
archiveOutputStream.closeArchiveEntry();
archiveOutputStream.finish();
return new ProcessFuture(proc);
}
}
private Process execCopyToPod(String namespace, String pod, String container, Path destPath)
throws ApiException, IOException {
String parentPath = destPath.getParent() != null ? destPath.getParent().toString() : ".";
parentPath = parentPath.replace("\\", "/");
return this.exec(
namespace,
pod,
new String[]{"tar", "-xmf", "-", "-C " + parentPath},
container,
true,
false);
}
When I kill my client process however, the file does not get created as it did before. I reverted back to using 'sh -c' but left off the base64 encoding/decoding steps and the behavior goes back to hanging, but the file does get created when I kill the my client process.
I wrote this hack method to test it in a more 'synchronous' way hoping to be able to debug better.
My First attempt was to explicitly read the proc.getInputStream() and the proc.getErrorStream() thinking that it needed to be consumed/flushed before the process could complete, but that didn't change the behavior so I pulled that code as it didn't seem to help.
I then split the TAR creation to writing a local temporary archive thinking maybe the TarArchiveOutputStream was causing a hang-up for some reason. That didn't help either.
The code below is basically working by doing a proc.destroy() after the try-with-resources streams are closed and always returns '0' since closing the websockets appears to allow the file creation to complete.
I have not yet found the right combination of flush()/close() calls that allow the process to exit normally. I think I've exhausted the limits of my understanding for the Copy.copyFileToPod() methods. Maybe there is an issue in the WebSocket handling? I'm just starting to look down that road.
private int copyFileToPodBruteForce(
String namespace, String pod, String container, Path srcPath, Path destPath)
throws ApiException, IOException {
// Run decoding and extracting processes
final Process proc = execCopyToPod(namespace, pod, container, destPath);
// Send encoded archive output stream
File srcFile = new File(srcPath.toUri());
try (ArchiveOutputStream archiveOutputStream
= new TarArchiveOutputStream(proc.getOutputStream());
FileInputStream input = new FileInputStream(srcFile)) {
ArchiveEntry tarEntry = new TarArchiveEntry(srcFile, destPath.getFileName().toString());
archiveOutputStream.putArchiveEntry(tarEntry);
Streams.copy(input, archiveOutputStream);
archiveOutputStream.closeArchiveEntry();
archiveOutputStream.flush();
archiveOutputStream.finish();
}
proc.destroy();
return 0;
}
}
The changes merged from Pull Request #1835 will allow me to use my work-around copyFileToPodBruteForce() method (see previous comment) to successfully copy files to the pod. However, the current copyFileToPod() methods still don't work (for me) as currently implemented.
I'm curious if others have issues using these methods? I've tried the copyFileToPod() against both an Ubuntu 20.04 and Alpine Linux 3.12 image running in our Google GKE 1.19.12-gke.2100 cluster.
Currently, the method copyFileToPod()
has the line
int exit = copyFileToPodAsync(namespace, pod, container, srcPath, destPath).get();
effectively
int exit = ProcessFuture<Integer>.get();
In the get()
method, it calls proc.waitFor();
and ultimately is stuck on this.latch.await()
in the ExecProcess class waiting for the latch counter to decrement.
My working theory is that the only way it will exit is if this.latch.countDown() is called and the only time it is called is when the following are called:
Without an explicit proc.destroy() (ignoring the failure() case) the remote process will never exit and send a message via stream=3. There does not appear to be any other way to cause the latch.countDown()
to be called
So barring some other mechanism to signal to the remote exec process that the input stream is done/closed (equivalent of CTRL-D?), calling destroy() seems to be the only way for this to work.
I did try sending the character equivalent of CTRL-D and closing the OutputStream without success. The only thing that seems to work ultimately is to close the actual socket that only happens if destroy() is called.
FWIW, I am having this same issue using copyFileToPod( String, String, String, Path, Path) using the Java API, version 13.0.1-SNAPSHOT.
As soon as the client code hits the copyFileToPod line, it hangs there apparently indefinitely. Only when I kill the client side process does the file get written in the container. The container in question is the only container in the pod and is running the fedora:latest image. The Kubernetes cluster in question is an Azure AKS cluster.
OTOH, copyFileToPodAsync() has no problem. I just took the Future it returned, wrapped it in a while(true) loop with a check on isDone() and everything worked as expected.
I have encountered the same issues in my application too. I was trying to build simple tool which would watch local directory and copy changed content into specific pod. But, unfortunately, all attempts to use Copy
mechanism failed because the waitFor()
was stuck indefinitely.
I also tried to use self-written exec analog, using the same tar
mechanism but the result was the same.
My last attempt was to simply pull the file from the server using wget
(it's more nasty, but no messing with streaming content through sockets at least). The result was also quite disappointing:
wget
processes were stuck, even though I got stdout
and stderr
back and the process reported successful completion;kubectl
works perfectly;java.net.SocketException: Connection or outbound has been closed
at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:267)
at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1224)
at okio.OutputStreamSink.write(JvmOkio.kt:53)
at okio.AsyncTimeout$sink$1.write(AsyncTimeout.kt:103)
at okio.RealBufferedSink.flush(RealBufferedSink.kt:267)
at okhttp3.internal.ws.WebSocketWriter.writeControlFrame(WebSocketWriter.kt:142)
at okhttp3.internal.ws.WebSocketWriter.writeClose(WebSocketWriter.kt:102)
at okhttp3.internal.ws.RealWebSocket.writeOneFrame$okhttp(RealWebSocket.kt:533)
at okhttp3.internal.ws.RealWebSocket$WriterTask.runOnce(RealWebSocket.kt:620)
at okhttp3.internal.concurrent.TaskRunner.runTask(TaskRunner.kt:116)
at okhttp3.internal.concurrent.TaskRunner.access$runTask(TaskRunner.kt:42)
at okhttp3.internal.concurrent.TaskRunner$runnable$1.run(TaskRunner.kt:65)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
This is a list of the processes which were not able to terminate, left after few attempts to run simple wget
command. Sending kill -9
didn't killed the process. And they remain in Zombie state.
/usr/local/tomcat/webapps # ps -ef
PID USER TIME COMMAND
1 root 23:44 /usr/lib/jvm/default-jvm/bin/java
709 root 0:00 sh
1323 root 0:00 sh
1993 root 0:00 [ssl_client]
1995 root 0:00 [ssl_client]
2239 root 0:00 [wget]
2240 root 0:00 [ssl_client]
2253 root 0:00 [wget]
2254 root 0:00 [ssl_client]
2270 root 0:00 [ssl_client]
2348 root 0:00 [ssl_client]
2354 root 0:00 [ssl_client]
2377 root 0:00 [ssl_client]
2390 root 0:00 [ssl_client]
2476 root 0:00 [ssl_client]
2486 root 0:00 [ssl_client]
2495 root 0:00 [ssl_client]
5853 root 0:00 [ssl_client]
6032 root 0:00 sh
6102 root 0:00 [ssl_client]
8112 root 0:00 sh
8186 root 0:00 [ssl_client]
8253 root 0:00 [ssl_client]
8261 root 0:00 [ssl_client]
8610 root 0:00 ps -ef
My cluster is AWS EKS with Kubernetes version 1.19 and I tested it on latest version of the client at the moment 13.0.1.
tar
command works just fine locally:
Path destPath = Paths.get("/tmp/fromFile");
final String[] tarCommand = {"sh", "-c", "tar xmf - -C " + destPath.getParent().toString()};
final Process tarProcess = new ProcessBuilder(tarCommand).start();
File srcFile = new File(Paths.get("/tmp/toFile").toUri());
try (OutputStream tarOutputStream = tarProcess.getOutputStream();
ArchiveOutputStream archiveOutputStream = new TarArchiveOutputStream(tarOutputStream);
FileInputStream inputStream = new FileInputStream(srcFile)) {
ArchiveEntry tarEntry = new TarArchiveEntry(srcFile, destPath.getFileName().toString());
archiveOutputStream.putArchiveEntry(tarEntry);
IOUtils.copy(inputStream, archiveOutputStream);
archiveOutputStream.closeArchiveEntry();
archiveOutputStream.finish();
}
I think the issue is: https://github.com/kubernetes/kubernetes/issues/89899 Basically remote exec command does not detect the end of STDIN.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
I can confirm that this issue still exists in the newest version 16.0.0 of client-java
. I am using the local installation of K8s 1.22.
Yes, this issue is still exists in 2023 :)
The reason is in that the TAR process is waiting for the end of its input stream (EOF). And this process is infinite because it never gets EOF.
When we start (un)TAR process manually we do something like this:
tar -xmf - -C . < archive.tar
which effectively the same as:
cat archive.tar | tar -xmf - -C .
As you can see here we have a pipeline. One process (OS itself or cat
) provides data, another (tar
) consumes it. As soon as the provider completes the pipe closes, and the consumer gets EOF and terminates.
In our case the copyFileToPod()
method never finishes because it never closes the pipe (i.e. closes WS connection?!). Thus we stuck infinitely in proc.waitFor()
...
Destroying the process after copying doesn't work - the TAR process lives remotely until the client app termination.
In 2024, has this problem not been solved yet? Is there a good solution for the big guy?
The changes merged from Pull Request #1835 will allow me to use my work-around copyFileToPodBruteForce() method (see previous comment) to successfully copy files to the pod. However, the current copyFileToPod() methods still don't work (for me) as currently implemented.
I'm curious if others have issues using these methods? I've tried the copyFileToPod() against both an Ubuntu 20.04 and Alpine Linux 3.12 image running in our Google GKE 1.19.12-gke.2100 cluster.
Currently, the method
copyFileToPod()
has the line
int exit = copyFileToPodAsync(namespace, pod, container, srcPath, destPath).get();
effectively
int exit = ProcessFuture<Integer>.get();
In the
get()
method, it callsproc.waitFor();
and ultimately is stuck onthis.latch.await()
in the ExecProcess class waiting for the latch counter to decrement.My working theory is that the only way it will exit is if this.latch.countDown() is called and the only time it is called is when the following are called:
- Exec.ExecProcess.streamHandler.handleMessage() // called for stream id=3 (remote exec completes/returns status)
- Exec.ExecProcess.streamHandler.failure() //Via low level exception handling?
- Exec.ExecProcess.streamHandler.close() //Closes stream handler
Without an explicit proc.destroy() (ignoring the failure() case) the remote process will never exit and send a message via stream=3. There does not appear to be any other way to cause the
latch.countDown()
to be calledSo barring some other mechanism to signal to the remote exec process that the input stream is done/closed (equivalent of CTRL-D?), calling destroy() seems to be the only way for this to work.
I did try sending the character equivalent of CTRL-D and closing the OutputStream without success. The only thing that seems to work ultimately is to close the actual socket that only happens if destroy() is called.
Excuse me, is there a good solution now?
tar
command works just fine locally:Path destPath = Paths.get("/tmp/fromFile"); final String[] tarCommand = {"sh", "-c", "tar xmf - -C " + destPath.getParent().toString()}; final Process tarProcess = new ProcessBuilder(tarCommand).start(); File srcFile = new File(Paths.get("/tmp/toFile").toUri()); try (OutputStream tarOutputStream = tarProcess.getOutputStream(); ArchiveOutputStream archiveOutputStream = new TarArchiveOutputStream(tarOutputStream); FileInputStream inputStream = new FileInputStream(srcFile)) { ArchiveEntry tarEntry = new TarArchiveEntry(srcFile, destPath.getFileName().toString()); archiveOutputStream.putArchiveEntry(tarEntry); IOUtils.copy(inputStream, archiveOutputStream); archiveOutputStream.closeArchiveEntry(); archiveOutputStream.finish(); }
I think the issue is: kubernetes/kubernetes#89899 Basically remote exec command does not detect the end of STDIN.
Excuse me, is there a good solution now
This issue is still present in version 20.0.1 Anyone got a workaround ?
This issue is still present in version 20.0.1 Anyone got a workaround ?
As a WA, I've copied the method from 1.0.1 in my class:
private void copyFileToPod(String namespace, String pod, String container, Path srcPath, Path destPath)
throws ApiException, IOException
{
// Run decoding and extracting processes
final Process proc = execCopyToPod(namespace, pod, container, destPath);
// Send encoded archive output stream
File srcFile = new File(srcPath.toUri());
try (ArchiveOutputStream archiveOutputStream = new TarArchiveOutputStream(
new Base64OutputStream(proc.getOutputStream(), true, 0, null));
FileInputStream input = new FileInputStream(srcFile))
{
ArchiveEntry tarEntry = new TarArchiveEntry(srcFile, destPath.getFileName().toString());
archiveOutputStream.putArchiveEntry(tarEntry);
ByteStreams.copy(input, archiveOutputStream);
archiveOutputStream.closeArchiveEntry();
}
finally
{
proc.destroy();
}
}
Client Version: 13.0.0 Kubernetes Version: 1.19.12-gke.2100 Java Version: Java 1.8.0_291 **Server
I have the copyFileFromPod() working, but copyFileToPod() hangs at Copy.java line 459 at proc.waitFor(). I've pulled source, but I don't understand the websockets well enough to know what is going on. A copy command from the command-line works fine.
I've verified that the 'tar' and 'base64' executables exist in the container:
I've not changed the timeouts as I'd expect a simple text file copy would work within the defaults.
It never seems to timeout or throw an error so I don't have any stack traces.
Also, I tried API version 12.0.1 (throws SocketTimeout almost immediately) and 11.0.2 (hangs just like 13.0.0).