sipcapture / homer-app

HOMER 7.x Front-End and API Server
http://sipcapture.io
GNU Affero General Public License v3.0
204 stars 79 forks source link

Wrong MOS score #410

Closed gledim closed 3 years ago

gledim commented 3 years ago

Hi guys,

In the QOS reports, we are noticing some calls getting MOS scores of 1 either from the beginning, or starting at some point, and continuing till the end of the call.

After some investigations, we noticed that at the point where MOS goes to 1, there are some lost packets.

From some old issues raised in Homer-Api, we found out that the MOS is calculated from a function that takes these missed packets in consideration, among others. Fixes for these issues seem to have been pushed at the respective repositories at the time, but is there any change they haven't been migrated to the new version?

We also couldn't find any reference of these calculating functions in the homer-app repo. Can you please point them out?

Glad to help with anything :)

lmangani commented 3 years ago

Hi @gledim We know exactly what you're talking about here - and this is more about about how RTCP itself works and how this applies to the MOS estimation - unlike our RTPAgent reports and as you know RTCP packet loss is reported as an incremental counter without "resets" in between the chain of reports and as such it gets displayed. We can of course adjust things to get more flexible but at the same time, we need to keep the tool honest towards the protocol it reports for as well. Our RTP analyzer does this the way you would expect in comparison, and we can share some examples.

Could you please attach a sample RTCP dataset affected so we can discuss with some visible numbers and confirm?

gledim commented 3 years ago

Thank you @lmangani

Yes, I was thinking exactly about this while investigating it.

I am attaching a sample dataset, that has some lost packets reporte, please let me know what do you think rtcp_set.zip

jacovinus commented 3 years ago

Good morning @gledim, we've analyzed your CSV file, but don't see any packets or MOS related data. Is the file corrupted perhaps? Please help us providing the complete file so we can help you :) .

gledim commented 3 years ago

Hi @jacovinus That is an extract of the DB records for the call, probably not very helpful :) I'm attaching the trace for another call, as captured locally and as exported from Homer Please let me know if anything else is needed traces.zip

rogelio-telnyx commented 3 years ago

@lmangani thanks for clarifying what's causing this issue. Can you please tell us how we can modify the MOS score calculation to consider incremental packet loss instead? Thanks!

lmangani commented 3 years ago

Hi @rogelio-telnyx The only way around it would be subtracting the preceding loss value from each RTCP round before the MOS calculation is performed. Will discuss it with the team.

rogelio-telnyx commented 3 years ago

@lmangani thanks for the confirmation, is there anything we can do here to help speed up the implementation of this fix?

rogelio-telnyx commented 3 years ago

@lmangani according to rfc3550 RTCP messages contain two values related to packet loss: "cumulative number of packets lost" and "fraction lost". The "fraction lost" metric seems to use the same approach as the packet loss metric from RTPAgent: fraction lost: 8 bits The fraction of RTP data packets from source SSRC_n lost since the previous SR or RR packet was sent, expressed as a fixed point number with the binary point at the left edge of the field. (That is equivalent to taking the integer part after multiplying the loss fraction by 256.) This fraction is defined to be the number of packets lost divided by the number of packets expected, as defined in the next paragraph. An implementation is shown in Appendix A.3. If the loss is negative due to duplicates, the fraction lost is set to zero. Note that a receiver cannot tell whether any packets were lost after the last one received, and that there will be no reception report block issued for a source if all packets from that source sent during the last reporting interval have been lost.

I confirmed that "fraction lost" is already stored in the Homer data set, so the question is, can we tell Homer to use this value instead of "cumulative number of packets lost" for MOS calculation?

gledim commented 3 years ago

Hey @lmangani @jacovinus By investigating some sample traces, we have noticed that there is an error in calculating the packets_lost , but this seems to be an issue happening in the HEP server.

So, whenever the cumulative number of lost packets is increase by 1 in RTCP packets, in Homer it is increased by 256. This then affects the MOS calculation, and whenever this happens, MOS drops to straight 1

Attached I'm putting 2 RTCP packets where this happens, and the respective one extracted from a trace from homer-app.

traces1.zip

In case more details are needed, please let me know

adubovikov commented 3 years ago

Hi,

I am not sure that you look at the correct field. In the latest version we take fraction packet loss and not cumulative loss. And this number should be multiplied by 256 because the field is usigned 8 bits integer. (RFC rtcp)

Regards, Alexandr

On Tue, 27 Apr 2021, 23:13 gledi.musa @.***> wrote:

Hey @lmangani https://github.com/lmangani @jacovinus https://github.com/jacovinus By investigating some sample traces, we have noticed that there is an error in calculating the packets_lost , but this seems to be an issue happening in the HEP server.

So, whenever the cumulative number of lost packets is increase by 1 in RTCP packets, in Homer it is increased by 256. This then affects the MOS calculation, and whenever this happens, MOS drops to straight 1

Attached I'm putting 2 RTCP packets where this happens, and the respective one extracted from a trace from homer-app.

traces1.zip https://github.com/sipcapture/homer-app/files/6387679/traces1.zip

In case more details are needed, please let me know

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sipcapture/homer-app/issues/410#issuecomment-827934975, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCN2JLE5AWBIHOFASLEUKLTK4SJHANCNFSM4XPWZMOQ .

adubovikov commented 3 years ago

Hi,

ok I see... who has generated that HEP message ?

Regards, Alexandr

adubovikov commented 3 years ago

So, it looks good:

LostPercentage=100.0*(float)report_block_get_fraction_lost(rb)/256;

and in the code in UI also correct:

https://github.com/sipcapture/homer-ui/blob/master/src/app/qos.worker.ts#L416-L418

this means, the value should be divided by 256 and multiplied by 100 to get percentage. This is that we do!

adubovikov commented 3 years ago

so, anyway please tell us who generated that HEP

rogelio-telnyx commented 3 years ago

@adubovikov we created a custom FreeSWITCH module that converts all RTCP messages that FS sends or receives into HEP messages that are sent to heplify server. We're putting together an example from a call that shows the values at the 3 stages:

This will help us understand where's the problem, please hold while we prepare the data.

adubovikov commented 3 years ago

@rogelio-telnyx sure, but this is already almost clear that looks like you send fraction_loss as packet_loss and cumulative as fraction. Please check these values!

rogelio-telnyx commented 3 years ago

@adubovikov indeed that's what we found! We will make the necessary modifications to the custom FS module and report back once we fully test it. Thanks

adubovikov commented 3 years ago

@rogelio-telnyx :-) good luck!

piotrgregor commented 3 years ago

So, it looks good: LostPercentage=100.0*(float)report_block_get_fraction_lost(rb)/256; and in the code in UI also correct: https://github.com/sipcapture/homer-ui/blob/master/src/app/qos.worker.ts#L416-L418 this means, the value should be divided by 256 and multiplied by 100 to get percentage. This is that we do!

Hi Alexandr. It looks like you are dividing by 256 and multiplying by 100 the number of packets lost (cumulative) and not the fraction here:

https://github.com/sipcapture/homer-ui/blob/4a027e1843e5a0740846db7519f6783c3007230a/src/app/qos.worker.ts#L411

Packets lost are not expressed in ratio*256 units (units of fraction lost) therefore this is not right: https://github.com/sipcapture/homer-ui/blob/4a027e1843e5a0740846db7519f6783c3007230a/src/app/qos.worker.ts#L417

adubovikov commented 3 years ago

https://github.com/sipcapture/homer-ui/commit/0d3b297e66c8948bed46e15e0173ccac818903ca

fixed

lmangani commented 3 years ago

@piotrgregor could you confirm if the patch was successful from your side of the integration?

lmangani commented 3 years ago

Closing as resolved. Feel free to reopen if needed ;)

piotrgregor commented 3 years ago

Yes. Thanks @lmangani