Closed dongsf closed 1 year ago
Currently, streams data is spread over the available paths, according to what the congestion control of each individual path allows, and avoiding the paths that have too many packet losses. It does not directly use a classic measurement of path quality, only an indirect one. We have internal data that could be exposed, namely:
Regarding RTT and RTT VAR, the one-way delay estimate is probably a bit too complex. To do it right, we would need to do multi-variable modelling, maybe approximated as a Kalman filter. We had lots of discussion of that as part of the standardization of MP QUIC (see for example this discussion on how to send ACK MP and compute RTT). This is a part of the code that could be simplified and made more robust, and I would like to do as soon as the multipath draft stabilizes -- probably in a couple of weeks. That would be a good time to add some API exposing path quality.
Just another question. You are asking to "evaluate quality of the paths before stream transmitting". Is this the same requirement as the "path affinity" API discussed in issue #1412 by @clarkzjw ?
You are asking to "evaluate quality of the paths before stream transmitting". Is this the same requirement as the "path affinity" API discussed in issue https://github.com/private-octopus/picoquic/issues/1412 by @clarkzjw ?
If I understand correctly, they are similar. In "path affinity", essentially we also need to evaluate path qualities before associating paths and streams and possibly disassociating them when paths become unavailable https://github.com/private-octopus/picoquic/issues/1412#issuecomment-1416670798. It might be a good idea to expose those internal metrics as @huitema mentioned, e.g., pacing rate, RTT, loss rate, etc.
Getting back to this issue. I would like to expose the required properties in a "safe" API, without forcing applications to build dependencies on the internals of picoquic. I think that we have several issues to solve.
First, we need to expose a unique path identifier. The internal code manages path contexts using a pointer to the structure picoquic_path_t
. The simplest solution would be to create a few callbacks to track path events such as creation, validation, suspension or deletion, and expose the path as an opaque pointer value in the callback -- possibly adding an "application path context" pointer to facilitate handling. The risk of course is that at some point the path will be deleted, leading to possible "use after free" by the application if it fails to remove any usage of that pointer. The alternative is to create a separate stable identifier for paths, which is safer but requires more code.
Once we have that, we can expose a "get path quality" API that would take as argument a connection pointer, a path pointer, and a structure that will receive the requested data. That part is simple. The issue for the application is, when to call that API. Maybe add a callback to signal when the path quality changes significantly.
Comments?
That would be great. I'm not very clear about the following two points, could you elaborate a bit more on them?
The alternative is to create a separate stable identifier for paths
Maybe add a callback to signal when the path quality changes significantly
The path "quality" is measured via loss rate? or fluctuations in RTT and others?
I think the issue with callbacks is "how often". For example, if the RTT changes from 100ms to 101ms, the overhead of sending a callback is not really justified. But if the data rate increases or decreases by 10 or 20%, that's probably useful. I could see the application setting a parameter, such as "signal a change if RTT or Data Rate changes by X%". But maybe we should keep it simple first: let the stack use some default for creating the signal, get experience with that, then tune?
Agreed, that sounds reasonable.
I am not sure how to exactly measure the "packet loss rate". In theory, this is simple -- divide packet lost by number of packets sent. But in practice we want both averaging over enough packets for the division to make sense, and then let the measure evolve quickly enough when the conditions change so that applications can make timely decisions. Or, the API could just provide the raw data, such as number of packets sent and number lost since the creation of the path, and let the application do its own divisions, averaging, etc. The latter is simpler.
There is also the question about the type of losses. There is a difference between "losses at the tail", detected by timers, and losses in the middle, detected by acknowledgements of packets sent with higher numbers. In the first case, the link may well be broken. In the second case, the path is probably still usable, because some packets are acked.
Or, the API could just provide the raw data, such as number of packets sent and number lost since the creation of the path, and let the application do its own divisions, averaging, etc. The latter is simpler.
I might prefer to provide raw data. In this case, different applications can have their own objectives.
A set of new API have been added to picoquic in PR #1482. They include:
int picoquic_get_path_quality(picoquic_cnx_t* cnx, uint64_t unique_path_id, picoquic_path_quality_t * quality);
void picoquic_get_default_path_quality(picoquic_cnx_t* cnx, picoquic_path_quality_t* quality);
int picoquic_subscribe_to_quality_update_per_path(picoquic_cnx_t* cnx, uint64_t unique_path_id,
uint64_t pacing_rate_delta, uint64_t rtt_delta);
void picoquic_subscribe_to_quality_update(picoquic_cnx_t* cnx, uint64_t pacing_rate_delta, uint64_t rtt_delta);
void picoquic_default_quality_update(picoquic_quic_t* quic, uint64_t pacing_rate_delta, uint64_t rtt_delta);
I think this solves the issue.
There are several multi-path connections to different servers in one client, and I want to evaluate quality of them before stream transmitting. Whether it is reasonable using RTT values (like smoothed_rtt in st_picoquic_path_t)to evaluate the quality ?
I have tested the RTT values , but it seems not been updated after packet received. It was caused by one_way_delay_sample always be negative in my test case. Is it a result of clock drift things?