warp-tech / warpgate

Smart SSH, HTTPS and MySQL bastion that requires no additional client-side software
Apache License 2.0
3.93k stars 122 forks source link

Broken Pipe Issue with `scp` and other tools #479

Open Abraxos opened 2 years ago

Abraxos commented 2 years ago

Hello,

I originally imagined that I had found a speed issue, but it turned out that speed does not appear to have been the core problem. I am running the most recent release version 0.6.4. I attempted to provide example data by comparing direct SSH connections for file transfers with a proxyjump configuration, and a warpgate configuration. I first tried it with rsync and everything worked perfectly, the speed was basically the same across all three.

I then tried to transfer files with mc and while I got speed readings for direct and jumphost, the warpgate connection would always error out, pretty quickly I might add:

2022-11-13 21_38_13-BURN-E

(yes, my name is Eugene, just like the author of this software =D)

I then figured that maybe its something wrong with the way that mc connects, and attempted to use scp to transfer the file instead, except I got the same kind of behavior:

eugene in 🌐 r2-d2 in ~/Downloads/test 
❯ scp warpgate.brainiac:/home/eugene/large_random_file ./large_random_file.warpgate
large_random_file                                                                                          3%  333MB  27.9MB/s   05:55 ETAc
lient_loop: send disconnect: Broken pipe
lost connection

eugene in 🌐 r2-d2 in ~/Downloads/test took 14s 
❯ scp direct.brainiac:/home/eugene/large_random_file ./large_random_file.direct
large_random_file                                                                                        100%   10GB  41.6MB/s   04:06    

eugene in 🌐 r2-d2 in ~/Downloads/test took 4m6s 
❯ scp jumphost.brainiac:/home/eugene/large_random_file ./large_random_file.jumphost
large_random_file                                                                                        100%   10GB  24.3MB/s   07:02    

and this happens pretty reliably. I was not able to transfer a 10GB random file across a Warpgate host. Please advise. Thank you.

P.S. Recording is disabled on my Warpgate instance P.P.S. As always, all my bug reports are done exclusively to improve this software. I really like it, and I appreciate all the devs' work. Nothing I say should be construed as ungrateful, insulting, or demeaning. Thank you for writing Warpgate, I really appreciate its existence and continued development.

stappersg commented 2 years ago

I was also able to reproduce the broken pipe issue with scp. I also see a few broken pipe messages in the ansible logs when running ansible with -vvvv. I have uploaded the ansible logs here: https://github.com/ntimo/warpgate-issue-459/tree/master/results/ansible_log_broken_pipe

Originally posted by @ntimo in https://github.com/warp-tech/warpgate/issues/459#issuecomment-1313936825

Eugeny commented 2 years ago

I think I might have solved it! At least, scp is not freezing up on my side anymore - wondering if this is going to help with #459 too!

ntimo commented 2 years ago

@Eugeny seems like the build https://github.com/warp-tech/warpgate/actions/runs/3465220847/jobs/5787693867 failed :( I can't therefore try if the fix works. Could you maybe check the build? Thank you a lot. And also thanks for the quick response / fix.

ntimo commented 2 years ago

I can confirm that the fix works and the scp copy of a 10GB file now works flawlessly. Thank you @Eugeny

Abraxos commented 2 years ago

Awesome! Thank you so much.

Not trying to hurry, but just for my own time estimates, when do we believe that version 0.6.5 will be available?

Eugeny commented 2 years ago

@Abraxos I've just pushed 0.6.5: https://github.com/warp-tech/warpgate/releases/tag/v0.6.5 :v:

Abraxos commented 2 years ago

I am sorry to report but I tested the new version 0.6.5 and i am still getting the same issue. SCP still crashes (though it lasts maybe a few seconds longer) and for some reason in the webUI it still lists the version as 0.6.4

Perhaps there was some kind of issue in version creation?

(and yes, I made doubly sure that I am downloading the right binary, its this one: https://github.com/warp-tech/warpgate/releases/download/v0.6.5/warpgate-v0.6.5-x86_64-linux)

stappersg commented 2 years ago

... still getting the same issue. SCP still crashes

Eugeney reopend this issue

@Abraxos same issue is in combination with mc? [Yes/No]

Yes: State explicite that mc is in play.

No: Make a fresh issue which documents that plain scp through warpgate fails.

Regards Geert Stappers

P.S. My github profile has documented how to contact me outside of github.

Abraxos commented 2 years ago

Yes, the same issue still happens with mc. My apologies, I neglected to clarify.

Abraxos commented 2 years ago

I re-installed the newest binary, which appears different from the one I got yesterday, and the issue persists with mc

Screen Shot 2022-11-16 at 12 55 42 PM

I happened to run mc this time over an SSH connection managed by the same warpgate instance that the transfer was going through. At the moment when the mc transfer failed, I also lost the SSH connection. It looked like something happened on the warpgate end causing it to disconnect all sessions, but I don't see anything in the log:

Screen Shot 2022-11-16 at 1 00 20 PM

The same thing (with all sessions apparently getting dropped) is happening with SCP as well. To replicate this issue (at least for me) its sufficient to SSH into a machine using the warpgate host, then start an scp/mc transfer of a large file from another machine also through the warpgate host. After 10-15s you will get disconnected. If you use tmux to keep the shell alive, you will discover that the mc/scp transfer also failed at the same time.

The good news though is that the version appears correct in the WebUI.

Eugeny commented 2 years ago

Thanks for checking - that binary update was just for the version number. No news on a fix yet.

Abraxos commented 2 years ago

That's OK, thank you for keeping me informed

Abraxos commented 2 years ago

I tried version 0.7.0 and same issue unfortunately with both scp and mc so far. No pressure, just updating for consistency.

ntimo commented 2 years ago

I also just noticed that when you copy large files using scp warpgate created huges recordings for this, not sure if this is related.

Abraxos commented 2 years ago

I also just noticed that when you copy large files using scp warpgate created huges recordings for this, not sure if this is related.

No, that's not a factor in this situation. The original message explicitly states that recording is disabled.

Abraxos commented 1 year ago

Just checking in, has there been any progress on this issue? This is basically the only thing keeping me from using warpgate for ALL my SSH

strazto commented 11 months ago

I think I have this issue when using warpgate for port-forwarding, eg, forwarding a port for a proxy- Usually the connection becomes more and more belaboured, until the terminal is unresponsive, and the forwarded port no longer functions.

At that point, I have to exit the session using the SSH escape sequence (enter, ~, . ), as ctrl+D interrupts no longer work.

Once I reconnect, I get a less languished connection, for a while, and then the slowness re-emerges.

strazto commented 11 months ago

Marginally related - I wish it were possible to disable recordings on some role-basis, so I could disable them for more "service" accounts, rather than shell accounts.

cmsmith1977 commented 11 months ago

I have this same issue and logged my debug results in this discussion: https://github.com/warp-tech/warpgate/discussions/415

I still have this problem daily and have to reconnect to make the port forwards work often.

If I don't use port forwards, all is fine...

Eugeny commented 11 months ago

I've fixed one particularly egregious padding calculation bug in russh and bumped it here - could you give the latest main branch a try?

Abraxos commented 11 months ago

Oh man that sounds like exactly the kind of thing that causes the symptoms I originally experienced. Lemme see if I can test it tonight P:

Is it by any chance in the nightly build yet?

Abraxos commented 11 months ago

I tried the nightly version and the effect is still the same. It cannot transfer files (or potentially SSH session contents) greater than about 1.4GiB

❯ scp sample.txt syncserver.external:/home/.../Sync/
sample.txt                                                                                               6% 1365MB  10.3MB/s   30:48 ETAc
lient_loop: send disconnect: Broken pipe
lost connection

❯ scp sample.txt syncserver.external:/home/.../Sync/
sample.txt                                                                                               7% 1437MB   6.4MB/s   49:45 ETAc
lient_loop: send disconnect: Broken pipe
lost connection

Looking through the logs of my instrumentation system, it says that the process gets killed with a SIGKILL suggesting that the OS killed the process after running out of memory. So I spun up htop while running the transfer one more time and watched as warpgate consumed all the available memory on the machine running it (1GB) and then all the available swap (512M) and then promptly got killed. The RAM consumption perfectly mirrors the amount of data that has been transferred. So that's where the 1.4GB thing above is coming from.

It seems like there is still something that is saving the contents of the SSH file transfer into memory even though recording is disabled. If we fix that, we will fix this issue.

Abraxos commented 6 months ago

So uh... weird note, I recently re-installed the OS on the server that was running warpgate and upgraded the OS to Ubuntu server 24.04. I decided to run a test, and the issue just kinda went away. Like I was able to first transfer a 21GiB file without observing RAM consumption going up (the server had 24GiB orf RAM) and then a 33GiB file transferred just fine.

I have no clue what could've changed, especially since I use configuration management to back up and copy the configuration database, so the configuration for warpgate is exactly the same.

Either way, so far as I can tell, this issue is not happening on Ubuntu 24.04.