Listen command stops working with i/o timeout

TimonVS commented 4 years ago

Issue

The listen command will sometimes suddenly stop working with the following error:

time="Tue, 25 Feb 2020 13:12:20 UTC" level=error msg="read error: read tcp 192.168.0.2:42458->54.187.159.182:443: i/o timeout"

This happens regardless of running it in Docker or on my Mac. It usually happens after a few hours, but sometimes it happens after 15 minutes as well. It pretty much always happens after I put my Mac to sleep for some time.

Expected Behavior

The listen command should recover from an i/o timeout or exit with a failure code so that we can configure Docker to restart.

Steps to reproduce

Not exactly sure.

Traceback

time="Tue, 25 Feb 2020 13:12:20 UTC" level=error msg="read error: read tcp 192.168.0.2:42458->54.187.159.182:443: i/o timeout"

Environment

macOS and Docker on macOS

KamilKopaczyk commented 4 years ago

I'm having the same problem on macOS + virtualbox

EDIT: stripe listen continues to work after timeout in my case. However, during that period when timeouts occur, events aren't reaching my machine and are not even registered in Stripe dashboard under "Webhook CLI responses" (when i view specific event).

@Stripe staff: feel free to contact me if you like. I can share some example event IDs

boobiq commented 4 years ago

@KamilKopaczyk Same thing is happening for me on macOS, I was having these timeouts since I started using CLI months ago and nothing stopped working, just recently lot of webhooks are not coming to me as you said, not even in dashboard.

jfalconmusik commented 4 years ago

Wow, I'm glad I'm not the only one. I'm using a PC, but I'm also receiving this error message and the webhooks aren't showing up in my dashboard either.

fabswt commented 4 years ago

Same issue with Stripe CLI version 1.3.9 running on Ubuntu inside a Docker container, on macOS. No problem whatsoever when testing with ngrok instead, so I don't think my connection is to blame.

Because of the issue, some notifications go missing.

fabswt commented 4 years ago

P.S.: this is really annoying… breaks the flow from coding, wondering instead "did stripe listen time out again?"

I seem to experience three levels of error:

The timeout notice is there, but the next API call will work anyway.
The timeout notice is there, but the next API call won't work. It just might on the second try though.
The timeout notice is there and listen went FUBAR, gotta restart it to get things working again.

And this is not a rare occurrence:

[Wed, 08 Apr 2020 18:52:10 CEST] ERROR read error: read tcp 172.17.0.2:36982->54.187.119.242:443: i/o timeout
[Wed, 08 Apr 2020 18:54:50 CEST] ERROR read error: read tcp 172.17.0.2:37038->54.187.119.242:443: i/o timeout
[Wed, 08 Apr 2020 19:07:37 CEST] ERROR read error: read tcp 172.17.0.2:44420->54.186.23.98:443: i/o timeout
[Wed, 08 Apr 2020 19:14:31 CEST] ERROR read error: read tcp 172.17.0.2:44446->54.186.23.98:443: i/o timeout
[Wed, 08 Apr 2020 19:18:41 CEST] ERROR read error: read tcp 172.17.0.2:44462->54.186.23.98:443: i/o timeout

tomer-stripe commented 4 years ago

Hey all, sorry for not responding to this previously but we're looking into this. If you're experiencing issues could you run the command with --log-level debug and share the output? It'd be helpful to see what if the CLI itself is experiencing connection issues.

tomer-stripe commented 4 years ago

I released a new version (1.3.10) that has some some more verbose debugging output. If you all have a chance to upgrade to the newer version and share the debugging output that'd be amazing!

One thing I want to call out is that there were two possible read errors we were showing errors for in the CLI. One of them is safe in that the CLI automatically reconnects after seeing it, so I changed the loglevel there from ERROR to DEBUG. This should make timeout that's causing reconnection issues louder.

For some additional context: the CLI itself is expected to disconnect and reconnect automatically (periodically and for errors). In this specific case it seems like something's gone wrong with our reconnection logic and I'm trying to isolate the case that's not reconnecting.

fabswt commented 4 years ago

@tomer-stripe Upgraded to 1.3.10, here's a log https://pastebin.com/tr609TWc !

tomer-stripe commented 4 years ago

@fabswt are you expecting to receive events and not? It looks like it's reconnecting okay and is still successfully ping/pong-ing with our servers there.

fabswt commented 4 years ago

@tomer-stripe I actually wasn't, just let it open for a while so you can see the timeouts. I can run it again next time I'm working with Stripe CLI to send events.

fabswt commented 4 years ago

@tomer-stripe Here's another log:

The timeline:

https://pastebin.com/gRNtuf7V
- Performed a refund (via the Dashboard) at 13:17 CET.
- Did not receive any event.
- Used stripe events resend evt_1GaK6lBkWO5ojRjqhTg0y9zv twice… still nothing.
- Waited 5 minutes overall.
https://pastebin.com/XLLu9V34
- Restarted the listen command.
- Used the same resend command. This time it worked.

This is one example. But happened a bunch of times before and is a source of confusion (wanna focus on the code, not on whether Stripe CLI lost it again.) Can't run the command with log level debug all the time because it's spammy/distracting.

I hope this helps.

tomer-stripe commented 4 years ago

Thanks @fabswt! We'll keep looking into what's happening here and I'll see if I can rework some of the listen client logic too.

auchenberg commented 4 years ago

From another user:

While running Webhooks listening there were a few occasional errors along the lines of: [Fri, 01 May 2020 12:20:22 CDT] ERROR read error: read tcp 192.168.1.18:62182->54.186.23.98:443: i/o timeout

Can provide contact details if needed.

KamilKopaczyk commented 4 years ago

I've have had a similar scenario as fabswt did. In my case some reconnect happened minute before the "crash". stripe-cli was running for 10-15 minutes without problems.

https://pastebin.com/ALdSYBDt stripe version 1.4.0

I wasn't aware that stripe events resend would work for events that were created when stripe-cli wasn't listening.

TimofeyBiryukov commented 3 years ago

Also getting this bug. While running stripe-cli in docker with stripe/stripe-cli:v1.5.9 image. Runs well for a while but stops forwarding events after a period of time. Restarting the container fixes the problem, but it repeats over time.

kevincerro commented 3 years ago

Same issue here.

MacOS 11.2.1 Stripe cli version 1.5.9 installed via Homebrew

Sometimes some events are lost and the only way is to resent them manually. Captura de pantalla 2021-03-01 a las 16 39 14

Can be related with https://github.com/stripe/stripe-cli/issues/433 ??

dyanakiev commented 3 years ago

Same Issue

MacOS Big Sur

Events stop coming and its reconnecting every few seconds.

➜ ~ stripe version stripe version 1.5.11 via brew

mpereira commented 3 years ago

Same issue. I have to restart the forwarder every now and then.

macOS 10.15.7 stripe-cli 1.5.14

In this output we see the CLI sending pings and getting pongs, reconnecting twice, and eventually losing the connection at the end, with the last log output being stopReadPump. I had to restart it after that, there was no more activity for hours.

stripe_listen_output.txt

jsakas commented 3 years ago

P.S.: this is really annoying… breaks the flow from coding, wondering instead "did stripe listen time out again?"

Agreed. Is there any update or plan to resolve this? It's been over a year since it was being looked into.

fabswt commented 3 years ago

Different day, same issue...

The only reliable way to get stripe listen to work is to restart the command. If you let it run for, say, 30 min, you never know if it'll work next time you do a payment test.

fabswt commented 1 year ago

Now another year with the same issue.

stripe / stripe-cli