hyperium / h3

MIT License
575 stars 75 forks source link

h3_webtransport datagrams packet loss/delay #209

Closed cybersoulK closed 10 months ago

cybersoulK commented 10 months ago

I have very high packet loss with my implementation.

the following method might lead to the freezing/packet loss that i am experiencing

this is the h3_webtransport::server::WebTransportSession::send_datagram:

/// TODO: maybe make async. `quinn` does not require an async send
pub fn send_datagram(&self, data: B) -> Result<(), Error>
seanmonstar commented 10 months ago

@darioalessandro or @ten3roberts may know better?

ten3roberts commented 10 months ago

Hmm, that is odd that you are experiencing data loss.

The send_datagram just uses quinn (or your selected backend) to send the datagram with a VarInt prefix in the payload.

What are you sending?

cybersoulK commented 10 months ago

@ten3roberts

What are you sending?

i am mostly sending positional data that fills up to the max_datagram_size.

ok i think i found the issue, i get a bunch of these errors: Err(h3::Error { kind: TooLarge, code: : None }) . data_len: 1422 . max_packet_size: 1422

it drops the package if bytes_len is equal to max_packet_size? why a stream_id needed and added to the payload?

i am also having a seperate (maybe related?) issue where the server inbound becomes delayed by a few seconds (worse in firefox) (while outbound flows normally), but i am doing more testing to get a plausible cause

cybersoulK commented 10 months ago

After extensive testing, manually subtracting 8 works as it should. But i have not been able to narrow down the client->server datagram delay. Sometimes reaches 5 seconds delay.

~Chrome on mac works perfectly as it should. Everything else is terrible: Firefox on mac and Chrome+firefox on windows.~ ~Chrome now seems to work well on both mac and windows. Firefox has this issue.~ Chrome and Firefox both have the issue. (something about the datagram queue and write_with_chunk.await)

any advice on where to look for or what it could be is highly appreciated!

darioalessandro commented 10 months ago

@cybersoulK in my experience the best way to fix these kind of issues is to create a minimal setup that allows us to reproduce the issue, @cybersoulK would you be interested in creating this for us?

cybersoulK commented 10 months ago

hey guys, i used sharkwire, and i finally identified the problem. It doesn't look like it's h3_server or the web_sys implementation of Webtransport.

the datagram are all successfully sent to the write queue using the function below.

fn send_datagram(connection: Rc<RefCell<WebTransport>>, data: Vec<u8>) {

    let datagrams = connection.borrow_mut().datagrams();
    let writer = datagrams.writable().get_writer().unwrap();

    let data = Uint8Array::from(data.as_slice());
    let _ = writer.write_with_chunk(&data);

    writer.release_lock();
}

But the problem is that my sync code cannot await on the write operation. So i either did this, or used a wasm_bindgen_futures. But neither of these solved the issue. I think i am going to be forced to use a web worker to make sure the packages are sent out from the queue.

cybersoulK commented 10 months ago

i will close for now, since my evidence concludes that h3 is good. But i still suggest to do something about max_datagram_size, since it's not clear to the user that he needs to subtract 8 bytes manually.

cybersoulK commented 10 months ago

https://github.com/w3c/webtransport/issues/543