Closed Tristan971 closed 1 year ago
Again a similar backtrace for https://github.com/haproxy/haproxy/issues/2120 with exactly the same value for buf : 0x0 and probe : 83 in qc_prep_app_pkts(). Could you provide us the coredump please?
Sent it your way! 👍
The matching binaries and symbols are here:
After analyzing the coredump, I think this time the BUG_ON() is right. I have to try to reproduce this issue, but not today.
I have managed to reproduce this issue, acknowledging two times the same ACK frame sent in the same packet. Here is a fix: https://github.com/haproxytech/quic-dev/commit/00f448693a0af73b10f21bfb4f69c70cc96f2129
@haproxyFred thanks for your quick fix. @Tristan971 we plan to release a 2.7 really soon to fix the other issue. Do you have the time to test Fred fix so that we may integrate it on the new release ? If it's not possible, no worry, we will release the 2.7 with only the fix for #2141.
oh wow nice one @haproxyFred!
@a-denoyelle even after merging the patch on my side I won’t be able to confirm that it fixes it, because I have never experienced that crash again :/ it’s many times rarer than the other one. I’ll let you know if it causes a new crash, but that’s all I could probably
based on that I’d just release 2.7 without it for now, since it’s absurdly rare anyway
Ok thanks for your feedback.
All this does not explain why the stack seems corrupted (buf = 0x0, probe = 83). I guess the next time haproxy will crash building the first frame in such a state.
As I said in my comment I can't confirm that it fixed the issue (since I didn't run into it again in a whole week without the patch), but at least the patch didn't induce any new crash in about 10 hours across our whole fleet, so if it makes sense from a logic standpoint it can probably be safely backported.
Just chiming in after 36 hours; still have no crash with this patch so it is at least not making anything worse 👍
All this does not explain why the stack seems corrupted (buf = 0x0, probe = 83). I guess the next time haproxy will crash building the first frame in such a state.
Wrong comment... In fact this had already been checked during a gdb debugging session.
Have you decided to not backport the patch in the end or was it just forgotten for 2.8-dev11? cc @wtarreau (not that it matters a lot, since things were fine without it too afaict, but just in case it was indeed forgotten)
No, it's just that we found that this code is particularly complex and deserved at least a comment, and as you said it was hard to trigger it we preferred to wait for Fred to be back this week, and since then he's been busy. Maybe we can get it updated and merged today.
Makes sense 👍
In fact Fred already updated its patch but I forgot to merge it. Sorry for this, and thanks for the reminder. @Tristan971 just to know, is it possible for you to test the master branch ? It would be useful to ensure we do not have introduce last minute regression with the coming release soon.
just to know, is it possible for you to test the master branch ? It would be useful to ensure we do not have introduce last minute regression with the coming release soon.
Yeah sure; I'll try and do that sometime today
As always thank you very much :)
is it possible for you to test the master branch ? It would be useful to ensure we do not have introduce last minute regression with the coming release soon.
Fwiw I've been running e279f59 for a week without any issue to report (besides #2147 but that one is not QUIC related and definitely not 2.8-exclusive either)
Either way, I just updated to ffdf6a3 now
More or less leaves #2095 as the only known QUIC "issue" as far as I'm concerned, and I can't imagine that you'd want to block 2.8 final release on it anyway, as it will probably take many small steps over time to get through it.
Well, ignore my comment here https://github.com/haproxy/haproxy/issues/2140#issuecomment-1554082998; turns out I'd been running 9de10ce (+ the patch) since and didn't realize... Must have forgotten to deploy... Now running ffdf6a3 (I double-checked...)
That bug didn't trigger again in 6 days for me. Imo we can close the issue.
never reproduced after months
Detailed Description of the Problem
HAProxy crashed on a BUG_ON (I'm so sorry... 🥲)
Expected Behavior
no crash
Steps to Reproduce the Behavior
No idea besides serving QUIC traffic
Do you have any idea what may have caused this?
No response
Do you have an idea how to solve the issue?
No response
What is your configuration?
Output of
haproxy -vv
Last Outputs and Backtraces
Additional Information
I don't have traces alas.
The pkt response buffer contains a portion of another HTTP request though, so it looks a lot like reading out of bounds