tpeczek / Lib.Net.Http.WebPush

Lib.Net.Http.WebPush is a library which provides a Web Push Protocol based client for Push Service.
MIT License
76 stars 13 forks source link

RequestPushMessageDeliveryAsync - task hangs forever. #15

Closed lafar6502 closed 4 years ago

lafar6502 commented 4 years ago

Hi, i'm trying to send notifications in the code below and have problem with the whole thing hanging. The code is able to send a message (because a popup shows in Chrome), but the async task for distributing the message never completes, and so the line Task.WaitAll(tss.ToArray()); hangs forever.

I have no idea if this is my mistake or a bug in the library. Can you suggest how to approach this?

public void SendMessageTo(int userId, NotificationMessage msg, int? secondsTTL = null)
        {
            var subs = this.GetUserPushSubscriptions(userId);
            if (!subs.Any()) return;
            var urg = msg.Level == MessageLevel.Error || msg.Level == MessageLevel.Warning ? PushMessageUrgency.High : PushMessageUrgency.Normal;
            var pm = new PushMessage(JsonConvert.SerializeObject(msg))
            {
                TimeToLive = secondsTTL,
                Topic = msg.EventName,
                Urgency = urg
            };
            try
            {
                var c = GetClient();
                List<Task> tss = new List<Task>();
                foreach (var s in subs)
                {

                    var ps = new PushSubscription
                    {
                        Endpoint = s.SubscriptionData.Endpoint,
                        Keys = s.SubscriptionData.Keys
                    };

                    var tsk = c.RequestPushMessageDeliveryAsync(ps, pm);
                    tss.Add(tsk);
                }

                Task.WaitAll(tss.ToArray());
                var fails = tss.Where(x => x.Exception != null);
                if (fails.Any())
                {
                    log.Error("Sent {0} notifications, fails: {1}: {2}", tss.Count, fails.Count(), fails.FirstOrDefault().Exception.ToString());
                }
                else
                {
                    log.Info("Sent {0} notifications", tss.Count);
                }
            }
            catch(Exception ex)
            {
                log.Error("Failed to send notifications to {0}: {1}", userId, ex.ToString());
                throw;
            }

        }
tpeczek commented 4 years ago

Hi @lafar6502

My apologies for not respondign sooner. I'm currently traveling and I will take a look at this next weekend.

techfg commented 4 years ago

@tpeczek - Firstly, thank you for all your work on these libraries and the articles, greatly appreciated!

I too am experiencing what appears to be this same issue. I'm running in an Azure function using the PushService binding and any time I await a RequestPushMessageDeliveryAsync it just hangs so this doesn't seem to be an issue with multiple tasks since only a single task encounters the behavior. The end result is a TaskCancelled after about 1-2 minutes (it varies).

The problem only seems to occur when running in Azure. Running in local dev and everything works as expected with no issues.

My workaround for now is to use WebPush directly but I plan on investigating this further and revert back to the PushService binding once resolved.

In short, the only change I need to make to get this to work is change:

Notification notification = new Notification("Some message")
{
    Title = "Some Title"
};
PushMessage msg = new PushMessage(JsonConvert.SerializeObject(notification));
await pushServiceClient.RequestPushMessageDeliveryAsync(sub, msg);

to

await webPushClient.SendNotificationAsync(new WebPush.PushSubscription(sub.Endpoint, sub.GetKey(PushEncryptionKeyName.P256DH), sub.GetKey(PushEncryptionKeyName.Auth)), JsonConvert.SerializeObject(notification), vapidDetails)

I'll keep digging in further as I have time but any thoughts/assistance you could provide would be appreciated.

Thank you again!

Edit: Spotted this loop that might be the culprit. Doesn't explain why using this lib encounters 429 while WebPush wouldn't but something to look at further possibly. I don't see a way to set AutoRetryAfter through the function binding (I'll add a feature request issue) so no easy way to test the theory but I'll look in to using PushServiceClient directly outside the binding and see if I can isolate.

vpetrusevici commented 4 years ago

I have the same issue. I spent 3 days but could not fix

vpetrusevici commented 4 years ago

The first request is always sent. All subsequent ones hang. But also reproduced locally.

tpeczek commented 4 years ago

@vpetrusevici As I'm on my way back home right now, I'll start looking at this tomorrow. In order to quickly check if it's not an regression from recent version you can try using v2.0.0

@techfg For Azure Functions that would be v1.0.0

vpetrusevici commented 4 years ago

I tried. It reproduced for all versions

techfg commented 4 years ago

@tpeczek - Hope you had a good trip home. Update from my side:

1) I was able to reproduce this locally and similar to @vpetrusevici it occurs on the 2nd message attempted within the function invocation.

2) I encountered a different result than @vpetrusevici when testing older versions. For me, I was only able to reproduce when running v2.2.0. Running v2.0.0 w/ v1.0.0 and v2.1.0 w/ v.1.10 worked without issue. I encounter the issue only with v2.2.0 and either v1.1.0 or v.1.2.0

In reviewing the commits, I'm just not seeing anything that's substantially new between v2.1.0 and 2.2.0 that could cause this so possibly I'm just getting randomly lucky with the older versions and @vpetrusevici results are more meaningful. That said, I ran quite a few tests and only v2.2.0 reproduced. The only thing that stands out to me currently in the codebase is that potentially infinite while loop on ShouldRetryAfter, and while it likely needs an escape hatch (e.g. maxretries) that same loop existed in 2.1.0 and all is good there for me at least.

Will continue to dig on my end

vpetrusevici commented 4 years ago

I apologize. After re-checking, I was convinced that the bug was not reproduced in previous versions. It is possible that during the first check the package was updated incorrectly. Sorry to be misleading. This is a regression bug

techfg commented 4 years ago

@vpetrusevici - Thanks for the update, glad to see we're experiencing the same behavior which should help narrow this down!

techfg commented 4 years ago

I was able to identify the issue. PR #19 submitted.

techfg commented 4 years ago

I might have spoke too soon. The issue is definitely better with #19 but I'm still seeing intermittent hangs - just not every time like I was before.

Possibly the issue existed prior to 2.1.0 which could be why @vpetrusevici thought it repro'd with the older versions initially since its now intermittent for me.

I'll continue investigating.

vpetrusevici commented 4 years ago

v2.1.0 is working perfect in my project now

tpeczek commented 4 years ago

I've put some time into it and this looks like regression from #11.

It looks that because HttpClient.PostAsync is disposing the HttpContent on majority of target frameworks, when there is an attempt to send the same PushMessage several times it results in unobserved errors.

techfg commented 4 years ago

Hi @tpeczek - Thanks again for looking in to this. I've been investigating as well and my findings align with your analysis. A couple of things for my scenario:

1) The same message is being sent to multiple endpoints 2) The response (via Fiddler) of the one that "hangs" is which would seem to indicate that its getting disposed prior to completing the write of the content

HTTP/1.1 408 Request body incomplete
Date: Thu, 05 Mar 2020 23:43:06 GMT
Content-Type: text/html; charset=UTF-8
Connection: close
Cache-Control: no-cache, must-revalidate
Timestamp: 15:43:06.606

The request body did not contain the specified number of bytes. Got 103, expected 311
tpeczek commented 4 years ago

@techfg @vpetrusevici I've commited the fix. I want to close few more issues before releasing new version. In general using v2.1.0 should solve the issue for you until I release the new version.

techfg commented 4 years ago

@tpeczek - Awesome news, thank you! Regarding next release, I'm working on a PR for #18 now, hopefully you'll consider for next release. Unfortunately, can't migrate to using this lib in prod until I can resolve which subscription failed (e.g. Gone) in order to perform clean-up.

vpetrusevici commented 4 years ago

@tpeczek thank you!

techfg commented 4 years ago

@tpeczek - Thanks again for your work on resolving this! Been testing with the latest code in master and have not encountered the hang issue thus far even under scale :)

tpeczek commented 4 years ago

@techfg Very happy to hear that. I feel pretty bad about letting this one get in. Lesson for future is to test with all possible TFMs, but that's sometime not that easy :(.

techfg commented 4 years ago

@tpeczek - I can appreciate feeling bad about it but no need to, your work on this library is awesome! Was incredibly easy to adopt/integrate. I noticed the new test project, glad to see that being added and I think it'll help moving forward :)

vpetrusevici commented 4 years ago

New version works good. Thank you for your great work!)

tpeczek commented 4 years ago

@vpetrusevici I'm happy to hear that.