wngr / async-datachannel

Async Wrapper for datachannel-rs
Apache License 2.0
9 stars 4 forks source link

Sometimes loses data when sending data at high rates #5

Open eras opened 1 year ago

eras commented 1 year ago

I noticed that sometimes when I send large amounts of data (e.g. hundreds of megabytes), sometimes messages get lost. Usually it's from the end, but it can be elsewhere.

I have modified the smoke test to reproduce the issue. It is most easy to reproduce it in release mode. The issue appears in one of two ways:

EXPECTED 202, RECEIVED 219 at 78758400

in the stderr of the recipient (the one that has fewer arguments).

Another way is to wait until the end and see that the recipient has not received all the data, while the sender is waiting for acknowledgement from the peer.

I suggest redirecting output to files when running the test. It doesn't also seem very deterministic, so fidgeting with buffer sizes might also impact this..

I don't yet know if the issue is in this library or libdatachannel, but I'll report it here first as I have not written a test case for libdatachannel yet.

edit: updated direct url to smoke.rs

eras commented 1 year ago

The issue is of course here: https://github.com/wngr/async-datachannel/blob/fcd7d2344caf1063658d2d18a74efd3a8cadf920/native/src/lib.rs#L82

    fn on_message(&mut self, msg: &[u8]) {
        let s = String::from_utf8_lossy(msg);
        debug!("on_message {}", s);
        let _ = self.tx_inbound.try_send(Ok(msg.to_vec()));
    }

I understand the problem is that one cannot send to an async channel because it can be busy, thus one must use try_send. It seems some other kind of channel facility is needed in this situation. One such facility would be an unbounded one, but a better would be one that can block in this situation. Or maybe DataChannel has some means to provide pressure to the other peer, other than simply blocking in this handler?

eras commented 1 year ago

Here's a tentative fix using tokio::task::block_on: https://github.com/eras/async-datachannel/tree/issue2-fix2

It's lacking some mechanism to report errors to the client.

There's also https://github.com/eras/async-datachannel/tree/issue2-fix1 but I don't think using unbounded channels is the best way to go. Wouldn't want to just consume tons of memory without providing any pressure at all anywhere.