xmppo / node-xmpp-bosh

An XMPP BOSH & WebSocket server (connection manager) written on node.js using Javascript
https://github.com/xmppo/node-xmpp-bosh
263 stars 85 forks source link

Reports not being sent. #2

Closed satyamshekhar closed 11 years ago

satyamshekhar commented 12 years ago

I found a bug yesterday, https://github.com/dhruvbird/node-xmpp-bosh/blob/master/src/session.js#L975 always evaluates to false. No, reports are ever sent -- since node.attrs.ack is not an integer.

dhruvbird commented 12 years ago

dang! We should sanitize all user sent attributes before we even start processing them.

satyamshekhar commented 12 years ago

:).. totally.. however when I fixed this a lot reports were being sent continuously.. since two response are sent simultaneously (initially) and the client is never able to recover. We should do this with a timeout.

dhruvbird commented 12 years ago

didn't get this bit...

satyamshekhar commented 12 years ago

actually there was another bug.. https://github.com/dhruvbird/node-xmpp-bosh/blob/master/src/session.js#L975 -- I changed line from if (node.attrs.ack < this.max_rid_sent && this.unacked_responses[node.attrs.ack]) to if (node.attrs.ack < this.max_rid_sent && this.unacked_responses[node.attrs.ack + 1]), which I know realise is not correct either. response corresponding to rid=node.attrs.ack will always be deleted in handle acks. So this will never evaluate to true. We need to decide when should we send a report and without how much timeout.

dhruvbird commented 12 years ago

I don't see what the problem is??

dhruvbird commented 12 years ago

hey, so I am not able to understand what the problem is! could you please elaborate for the benefit of the challenged!!

On Thu, Feb 9, 2012 at 10:51 AM, satyamshekhar reply@reply.github.com wrote:

typo: to if (node.attrs.ack < this.max_rid_sent && this.unacked_responses[node.attrs.ack + 1])


Reply to this email directly or view it on GitHub: https://github.com/dhruvbird/node-xmpp-bosh/issues/2#issuecomment-3889678

   -Dhruv Matani. http://dhruvbird.com/

"What's the simplest thing that could possibly work?" -- Ward Cunningham

satyamshekhar commented 12 years ago

I am really sorry for not being clear. I have this tendency.

Check out the change I made here https://github.com/dhruvbird/node-xmpp-bosh/commit/9f0b359c4349d301a15e67d5fefc58a371b120e2#L0L977 (though this is a mistake). Lets consider the code as it was before this change. We have three issues -

This line is wrong since we are checking for this.unacked_responses[node.attrs.ack]. This will always be false since while handling acks we delete from unacked responses the response for which we have received an ack (all response for rid <= node.attrs.ack).

-- Also, we should see if there is a certain amount of time has pass before we enqueue a report. For example -

  1. server is holding r1.
  2. server sends something on r1.
  3. before receiving r1 the client sends something on r2.
  4. server sends a report for r1 (r1 wont be acked).

Probably its an extreme edge case - and also the cost is only one extra response being sent from the server in case it has nothing to send.

dhruvbird commented 12 years ago

Let's try to reason this out.

  1. When we send a response rid=12 (suppose), then the next message that the client sends might ACK only up to rid=11 (which is okay) since the message we sent might still be in transit. However, the XEP mentions that if the client ACKs rid=11, then we may want to send a report. Let's suppose that our condition to trigger a report is: user's_rid < max_rid_sent
  2. When we receive a packet with ack=11, we acknowledge all packets up to (including) rid=11 (if rid <= the max rid we have sent to this client). i.e. we delete all responses we have sent from unacked_responses since they have now been ACKed.
  3. I believe that the check for this.unacked_responses[node.attrs.ack] is correct - see http://xmpp.org/extensions/xep-0124.html - it's just that we should delete all unacked responses AFTER processing the reports. Additionally, the bug seems to have been there since eternity, so it's super that you were able to catch it!

Does that make sense?

On Thu, Feb 9, 2012 at 3:16 PM, satyamshekhar reply@reply.github.com wrote:

I am really sorry for not being clear. I have this tendency.

Check out the change I made here https://github.com/dhruvbird/node-xmpp-bosh/commit/9f0b359c4349d301a15e67d5fefc58a371b120e2#L0L977 (though this is a mistake). Lets consider the code as it was before this change. We have three issues -

This line is wrong since we are checking for this.unacked_responses[node.attrs.ack]. This will always be false since while handling acks we delete from unacked responses the response for which we have received an ack (all response for rid <= node.attrs.ack).

  • Even if we correct the above two issues, it still wont work properly for clients that dont have acks, since for such clients we assume the ack to rid - window. Due to this assumption we will end up sending a report everytime (since ack < max_rid_sent).

-- Also, we should see if there is a certain amount of time has pass before we enqueue a report. For example -   1. server is holding r1.   2. server sends something on r1.   3. before receiving r1 the client sends something on r2.   4. server sends a report for r1 (r1 wont be acked).

Probably its an extreme edge case - and also the cost is only one extra response being sent from the server in case it has nothing to send.


Reply to this email directly or view it on GitHub: https://github.com/dhruvbird/node-xmpp-bosh/issues/2#issuecomment-3894767

   -Dhruv Matani. http://dhruvbird.com/

"What's the simplest thing that could possibly work?" -- Ward Cunningham

satyamshekhar commented 12 years ago

Right, so what I had done was a mistake - will correct.

That means for a client with no acks we will end up sending a lot of reports, since we assume that for such a client ack = rid - window. Maybe we should disable acks for such clients.

Its just that at times there will an extra RTT incurred since we sent a report when it was not required. Shouldn't we wait for some time out (~5 secs: > RTT/2) before sending a report?

dhruvbird commented 12 years ago

hmm... yes, for non-ack clients, we'll land up sending a report every time.

It makes sense to just have a flag which check if acks are enabled for this client and not do anything if acks aren't enabled (session.ack should reflect this).

As far as sending a report is concerned, we can do it immediately. To save traffic, we can wait for 1 rid instead of a difference of 0. i.e. If the client has ACKed 1 less than the current, we can live with it. Introducing a delayed event would probably complicate things since a lot of state would need to be updated if the ACK does come in within that window. (trying to keep it simple) - what do you think?

satyamshekhar commented 12 years ago

I was thinking about it differently - Cant we send a report only if the time we sent the prev response is greater than a threshold (~5secs)?

dhruvbird commented 12 years ago

Dunno if that's a good idea since potentially many packets can be exchanged within that time frame. Plus, the point of a report is to let the client know sooner than later about a potentially missed response... Basically, I'm trying to stay away from anything that involves "guessing" a good constant/time value since that is where things tend to go - esp. when there is a decent deterministic solution.

satyamshekhar commented 12 years ago

I know what you mean.. Cool..

Also, the timestamp we send in the report is the timestamp for the acked response. Shouldn't it be timestamp for the least rid unacked response?

dhruvbird commented 12 years ago

"In this case it SHOULD include a 'report' attribute set to one greater than the 'ack' attribute it received from the client, and a 'time' attribute set to the number of milliseconds since it sent the response associated with the 'report' attribute."

I am guessing that "the response associated with the 'report' attribute" is the same as ack from client + 1.

satyamshekhar commented 12 years ago

Ya.. so I think we were sending timestamp for node.attrs.ack - the change currently in the master is the change I made thinking we should check for node.attrs.ack + 1 instead of node.attrs.ack -- which you corrected.

The situation is like this: i) We check if we have node.attrs.ack in our unacked_responses - so that we dont keep sending the same report again and again. ii) We check if we have node.attrs.ack + 1 in our unacked_responses - so that we can to send time stamp for that.

So, now this brings us to the case where we dont have node.attrs.ack + 1 in our unacked responses(the request got lost) but max_rid_sent > node.attrs.ack. How do we handle this?

dhruvbird commented 12 years ago

Something doesn't seem right here.

Suppose we just responded on rid=11. The previous rid=10. Before our response (rid=11) reached the client, it ACKed rid=10 on (request) rid=12.

The check node.attrs.rid(10) < max_rid_sent(11) will succeed and a report will be sent out. This is because we are processing acknowledgments after the reports are sent out. Doesn't this seem wasteful?

Also, do you think it's right to process ACKs ''after'' handling the 'report sending?

dhruvbird commented 12 years ago

After giving this some more thoughts, these are my observations:

  1. We should process RID/unacked_responses updates before sending out the report.
  2. To avoid too many reports being sent, we should check if (rid < max_rid_sent - 1 && unacked_responses[rid + 1]) -- @satyamshekhar you were right about the latter conditional all along. Additionally, we need only worry about the object in unacked_responses[rid + 1] since THAT is the earliest unacknowledged response (according to our logic).
  3. There is an invariant that we MUST maintain on unacked_responses. Namely: if unacked_responses[k] is truthy, then unacked_responses[k .. max_rid_sent] MUST also be truthy. This allows us to freely index unacked_responses and avoid a situation where unacked_responses[rid+1] is NOT present, but unacked_responses[rid+2] is, etc...

Do you think this makes more sense?

satyamshekhar commented 12 years ago

Consider the following situation:

If we have a check for rid < max_rid_sent - 1

  1. The client sent request 10 and 11.
  2. Server replied on 10 and held 11.
  3. Before wait period the server sent something on the response for 11 which got lost.
  4. Now the client sends request 12, he ll ack request 10.
  5. max_rid_sent is 11, ack is 10. -- we should send a report for this.

I think the check (acked) rid < max_rid_sent was correct. The only problem with this is that in an ongoing chat, we might end up enqueing a lot of reports - and thus degrading our performance -- since a lot requests will be exchanged,

The check unacked_responses[rid + 1] or unacked_response[rid] seemed similar to me initially but they are not. Ideally, we should have these methods(enqueue_report and handle_acks) independent of each other -- their order should not matter. We achieve this by using node.attrs.ack + 1. Another, consequence of using rid + 1 is that we might send multiple reports for the same rid, if rid + 1 is not acked. However, if we use the check unacked_responses[rid] it forces us to send only one report.

The ideal solution to the problem would be to immediately send a report if we receive a request, which doesn't ack a response that we had sent after even after RTT secs (its median) -- this can be many times. Otherwise, we dont enqueue any report. We need the check for unacked_responses for misbehaving clients(in case it has already acked it before).

dhruvbird commented 12 years ago

max_rid_sent is 11, ack is 10. -- we should send a report for this.

According to the spec, we should. Do we have any numbers on how often this happens for well-behaved clients with a good network? (false negatives?)

I agree with you about the fact that their order should not matter. However, this is true if we use the node.attrs.ack+1 logic instead of keying on node.attrs.ack since as you mentioned if we use the latter, and we delete ACKed responses before handling 'report', we shall never find the unacked response.

I guess it is "okay" to send out multiple reports (barring performance). I mean it's not a correctness issue. In fact, it is "correct" to send multiple reports. The reason being that the first report might be lost.

satyamshekhar commented 12 years ago

According to the spec, we should. Do we have any numbers on how often this happens for well-behaved clients with a good network? (false negatives?)

No numbers yet, but are reports meant for good networks. Shouldn't they optimize bad networks without hindering performance for good networks?

I am still biased towards the timeout solution :). Though, I also get why you think its not at all clean.

dhruvbird commented 12 years ago

Maybe we need to create another "report queue" to queue up all report-type messages. Reports can be clubbed with normal responses. Send a report along with a normal response if one is to be sent?

satyamshekhar commented 12 years ago

We en-queue reports with normal responses right now as well. Why do you think a separate queue is required?

dhruvbird commented 12 years ago

We can (potentially) club:

  1. A normal xmpp response
  2. A bosh response (not report type)
  3. A 'report' type bosh response

Currently, we don't distinguish between [2] & [3].

dhruvbird commented 11 years ago

@satyamshekhar This is almost fossilized. Is there anything actionable that came out of this? Closing for now. Re-open if needed.