Open ansemjo opened 1 year ago
@jesup and @LPardue might both be interested in this one.
Yes I'm interested. Each event should be atomically written. If you're able to pinpoint some issue in the qlog crate I'm happy to take a look.
Yes I'm interested. Each event should be atomically written. If you're able to pinpoint some issue in the qlog crate I'm happy to take a look.
If you're waiting on me @LPardue .. I'm afraid I can't help you with that, sorry. :/
I haven't used the qlog
create directly yet; in fact I haven't used neqo
directly yet either. And to be honest, I only picked this place over Bugzilla because I already have an account here and I figured that since neqo
is the QUIC component in Firefox it might make sense. 🙃
If you have some specific instructions, e.g. on how to capture a more detailed trace with Firefox, or questions about my reproducer, I'm all ears though! 😄
Thanks @ansemjo , I think we need on of the mozilla team to give a pointer how Firefox uses the qlog crate. It seems to me like there could be a multi-thread problem in how the app accesses qlog access that is causing a single even to be truncated. I'm not familiar enough with the neqo code base and lack the time right now to spend on figuring that out, but I am very interested in getting to the root cause and a fix.
This is slightly offtopic to the "corrupted logs" issue but in my original issue I tried capturing some more logs to find the cause. The last capture in https://github.com/quic-go/webtransport-go/issues/84#issuecomment-1635782400 shows that both qlog
files from Firefox and webtransport-go
logged all packets properly, while the JavaScript code was stuck again after just a fraction of that.
To me, that sounds like an issue somewhere at the interface between neqo
and Firefox. As you say, maybe it's a multi-threading issue somewhere ...
On closer inspection, neqo is using an old version of the qlog crate that provided a hack-ish way to do streaming JSON serialization,this is a possible cause of the weird logs. Since 0.4.0 we've moved to the JSON-SEQ format which could possibly fix this problem, see the qlog bump PR https://github.com/mozilla/neqo/pull/1454
We're now on qlog 0.12.0. Closing this under the assumption that that has addressed the issue.
I've checked my reproducer above and I don't have any corrupted qlog files anymore; all of them can be imported fine in qvis. However, the "blocked streams" from my original issue still occur.
Thanks for the update. Reopening this.
Thanks. I'm not sure this here is the right issue for it though? I've held off on reporting it to Mozilla directly because I didn't have the time for it back when I found it and I've personally only tested it with this one quic-go server.
This particular issue was about the broken qlog files - and those seem to be fixed.
Well, this repo is a Mozilla repo :-) I'll leave it up to you if you want to close this and reopen it when you have confirmed it is our issue.
You're right. I was thinking of the Bugzilla issue tracker though. :P
Sorry for staying silent on this for so long. I just updated the dependencies in my reproducer and retried with Firefox 124.0.1 on Linux. The issue still occurs, albeit much less frequently! Here are two fresh qlog files from Firefox and quic-go, where the unidirectional stream is stuck after sending 22 blobs or about 2 MB:
I guess, in order to further narrow it down to an actual Firefox issue:
It might also be entirely unrelated to the QUIC code and instead be an issue with AsyncIterators? At least the simple while (true)
loop in the reproducer surely can't have much room for errors, can it?
const reader = stream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = value;
received += chunk.length;
console.log(...prefix, "CHUNK", received, chunk);
};
Could you maybe try to reproduce the issue on your end first?
Could you maybe try to reproduce the issue on your end first?
Am trying to. Am following the steps in your reproducer. What is xdg-open https://localhost:7443/
supposed to do, that doesn't seem a command I have available after having followed the steps until then?
Could you maybe try to reproduce the issue on your end first?
Am trying to. Am following the steps in your reproducer. What is
xdg-open https://localhost:7443/
supposed to do, that doesn't seem a command I have available after having followed the steps until then?
Ah, sorry, that is simply opening the URL in a web browser (Firefox in this case). On Linux with a Gnome Desktop that usually opens any compatible program, which can handle https://
links.
Ah, got it. Is there any way to just test this with the neqo-client demo client? Would be much easier to debug.
That's among my TODOs to narrow it down; haven't tried that yet and I haven't worked with neqo directly either yet. You can see a client implementation in Go in the gist but as far as I remember, I couldn't reproduce it with a webtransport-go
client yet. That's why in https://github.com/quic-go/webtransport-go/issues/84 it was decided that it's probably not a webtransport-go
issue.
I seem to able to reproduce this, but I am not entirely sure. @ansemjo , what's the expected output of the browser console? This is what I saw:
I seem to able to reproduce this, but I am not entirely sure. @ansemjo , what's the expected output of the browser console? This is what I saw: [...]
Your screenshot shows a success, actually. Meaning your WebTransport stream did not hang. You can see that because the logged lines contain the total number of bytes received, which should be 32 * 100 KiB = 3276800 at the end. The server window is a better indicator because the output should just stop after some blob without ever printing the done sending
in the end.
In the latest commit (revision 12 in the gist) I added a SUCCESS message to the browser console output. When you don't see that, your stream is stuck. Edit: I realize I should be clearer with my messages. 🙃 So, just to be abundantly clear: when you do see a SUCCESS or 3276800 received bytes, you did not hit this issue. When that's missing and your stream is stuck, that's when you ... successfully reproduced the issue. Sorry, I need to fix that wording when I'm home.
I'm on another computer with Windows right now and reproduced it after a few tries. The binary was compiled with GOOS=windows go build -o reproducer.exe main.go
under WSL2.
Sorry, I have no luck to reproduce this at my side.
@ansemjo , could you try to follow the instructions here to record a log file?
Please set New log modules:
to timestamp,sync,nsHttp:5,WebTransport:5,nsWebTransport:5,neqo_http3::*:5,neqo_transport::*:5
and select Logging to a file
.
Thanks.
Sure! Wow, that log is verbose. I used a Firefox Developer 125.0b6 (64-bit) window with a clean profile because I have many tabs open and they all appeared in the log otherwise.
webtransport-go
logged for this connection: quic_323636_666235393...23563.qlog.txtAnd how it looked in the terminal:
2024/04/11 11:17:32 connection from 127.0.0.1:51676
2024/04/11 11:17:32 logging QUIC connection to "quic_323636_6662353932323963383361386630646530353533613538343261373637323563.qlog"
2024/04/11 11:17:33 write blob #00
2024/04/11 11:17:33 write blob #01
2024/04/11 11:17:33 write blob #02
2024/04/11 11:17:33 write blob #03
2024/04/11 11:17:33 write blob #04
2024/04/11 11:17:33 write blob #05
2024/04/11 11:17:33 write blob #06
2024/04/11 11:17:33 write blob #07
2024/04/11 11:17:33 write blob #08
2024/04/11 11:17:33 write blob #09
2024/04/11 11:17:33 write blob #10
2024/04/11 11:17:33 write blob #11
2024/04/11 11:17:33 write blob #12
2024/04/11 11:17:33 write blob #13
2024/04/11 11:17:33 write blob #14
2024/04/11 11:17:33 write blob #15
2024/04/11 11:17:33 write blob #16
2024/04/11 11:17:33 write blob #17
2024/04/11 11:17:33 write blob #18
2024/04/11 11:17:33 write blob #19
2024/04/11 11:17:33 write blob #20
And finally, the browser console:
I hope that's helpful to you.
Hi! I opened an issue in
quic-go/webtransport-go
two days ago because certain BidirectionalStreams got stuck when I sent too much data in a short interval and I suspected the Go server at first. However, I couldn't reproduce this issue with neither a Go client, nor Chromium. The reproducer I used is in this Gist of mine.@marten-seemann suggested that I record a qlog and visualize it with qvis. I set
network.http.http3.enable_qlog
in Firefox'about:config
and tried to visualize the log. But the recorded JSON is broken / corrupt. I've attached such a log from opening the reproducer above in a fresh Firefox profile here: localhost.qlog.txtOpening the file in VSCode highlights a few places where it looks like maybe writes to logfile were interleaved, so the objects were not properly printed?
I'm not sure how I can debug this further and I haven't worked with
neqo
directly yet but this seemed like the most sensible place to report this issue? For now, it seems I found a workaround for my initial issue, anyway.