signalapp / Signal-iOS

A private messenger for iOS.
https://signal.org
GNU Affero General Public License v3.0
10.6k stars 2.95k forks source link

resolve potential data race in replaceNotification on batched edit notifications? #5764

Closed donaldguy closed 1 month ago

donaldguy commented 4 months ago

First time contributor checklist

Contributor checklist

... to the best of my knowledge

I have not tested changes because the situation is a hypothetical and/or difficult to reproduce data race.

That said, whereas the whole effect of the change is re-ordering two occurrences of 2 sequential lines of code to an order more logical, I would expect this to:

UNLESS a more experienced iOS programmer knows a specific reason for the extant order to be preferred

(and if that is the case, I would suggest that a // XXX: comment or similar is in order to explain the less-intuitive order)


Description

It seems presently possible that a notification can fail-to-cancel & end up left [near-]duplicated instead, as a result of precise unfortunate timing wherein:

which appear presently in that order, in error?

I have not been able to come up with any[^1] reasoning for this order to be preferred.

Context suggests that previous touching authors (in de61926ec8e8a0d1792c4a8dea2ba5b16b39879e and af6ff96975964fa483136e3a9625320abb19a710 ) errantly believed[^2] order to be irrelevant (owing to claims of synchronous action that are not met by the underlying iOS API)

[^1]: outside of code style preference for alphabetical order iff its equivalent

[^2]: They might also have known it to be irrelevant due to other features of previous versions of the code since removed. I didn't find those, but I didn't do an exhaustive search and commit messages perhaps hint so

My suspicion there could be a good, unspecified reason for this order obscure to me is further weakened by noticing that the analogous calls to removeAll{Delivered,Pending}Notifications happen in [this PR's / the intuitively-safer] pending-then-delivered order.

That has been the case for the removeAll* calls since introduction to present, to wit:

In particular,

I believe this race may on occasion be actually occurring when e.g. multiple edits to the same message are being (belatedly) batch processed back-to-back

(after re-establishment of interrupted network connection;

or possibly mere sleep and resume of envelope processing per iOS Background App refresh)


(Except for) hoisting this out for emphasis —and also because apparently the callout parsing is suppressed inside a <details> even where other markdown is processed (and I don't think there is even effective html option to get it back?)

[!IMPORTANT] In particular, a friend with whom I frequently converse informed me that this claim [to wit "edits… don't generate push notifications ... at least"] was wrong that they were:

99% sure I've gotten pushes of you editing though -- the same message showing up a few times in a row (same preview text)

... This, with foreknowledge of further investigation suggests that this race (or similar, such as Apple simply not completing requested cancellation for other reasons) is occurring, not just hypothetical?

I went ahead and squashed this into a <details> once I realized how already also long my (originally a ) TL;DR had become.

But it should answer any questions of why I am proposing this (outside of ~mismatch to intuition)

Full Motivation & relevant code path

## Oops I ***was*** push-spamming actually… I am an inveterate serial-editor. Both, as the feature is probably mainly intended, for fixing typos and such. **AND** as a matter of more dramatically rephrasing or expanding yet-unread messages before recipient's first read receipt fires This last practice was one undertaken intentionally under presumption that (as informed by experience with e.g. Discord, Slack, Facebook-comments, reddit comments, etc) by making my expansions thusly I was avoiding spamming recipient with additional push notifications (if they were not looking at Signal, but within earshot of device or had it in pocket, etc) Earlier today, I learned this was (sort of) not true! 😱 `` In particular, a friend with whom I frequently converse informed me that this claim [to wit "edits… don't generate push notifications ... at least"] was wrong that they were: > 99% sure I've gotten pushes of you editing though -- the same message showing up a few times in a row (same preview text) ... This, with foreknowledge of further investigation suggests that this race (or similar, such as Apple simply not completing requested cancellation for other reasons) *is* occurring, not just hypothetical? `` Embarrassed at being (_even_) noisier than I thought I was, I decided to dive down this rabbit hole. I discovered that indeed, the logic is explicitly to generate notifications _only_ for edits of unread but previously delivered messages (in opposition of my intuition) here: https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalServiceKit/src/Messages/MessageReceiver.swift#L1638-L1646 (assuming indeed that multiple dispatch makes this the path for the definition from L1591 always taken into `handleIncomingEnvelope` ; as it stands I do not understand when that from L1535 would instead be run - but whereas it does not clearly generate notifications under any circumstance, it is not at issue?) ## …or ***was*** I? (yes, but maybe not as bad as I briefly thought) Now my embarrassment was somewhat relieved when I looked into the rest of the origin commit (66d7592f59a5ae92fbb57d6a1c7e5706bd1f2b52) for that block / the present `main` pieces of that code path and discovered: ### 1. at least I'm spamming quietly? Stepping directly in from [that](https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalServiceKit/src/Messages/MessageReceiver.swift#L1638-L1646) call to `notifyUser`, then skipping over the first 117 lines that follow (including early descent into relevant `notifyUserInternal`), we come eventually on: https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalMessaging/Notifications/AppNotifications.swift#L685-L698 with line 686's suppression of noise for this notification at least... maybe[^4] comforting [^4]: whereas my friends are mostly phone-always-on-silent types, I hope, and have yet to comfort myself with tests or more code reading, that "silence" extends to vibration/haptics and e.g. popping up on an apple watch, vibrating wrist. and ### 2. the code clearly _intends_ to _avoid_ doing notification-spam Picking up from the next line and continuing to end of function: https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalMessaging/Notifications/AppNotifications.swift#L699-L714 its fairly clear that the intention is explicitly to replace the notification with one reflecting any edit (ironic in my particular case where only rarely has what prefix would fit in the push's preview actually changed) and _not_ to generate an additional redundant notification in particular things seem to prefer to drop (re-)queueing a revised notification entirely over creating duplication (under conditions where `replaceNotification` doesn't, to its knowledge, succeed - by for example failing to locate a previous version of the message (as might occur if e.g. user dismissed it but hasn't opened Signal? - in which case it is _especially_ correct to not seemingly "grow back") #### ... I'm pretty sure, at least > I am inexperienced enough with swift in particular that I'm a little uncertain as to what calling `completion()` inside the guard's else will do in practice > > best as I can tell it will just effect a jump/co-routine-yield to > > https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalMessaging/Notifications/AppNotifications.swift#L1307 > > making that call no more than a ~cleanup / continuation-passing-style-return (before the stack pop return on the next line) > > I'm a bit befuddled by _why_ this is written this way, and my second guess is that the `completion in` proceeding this block makes this some sort of recursive binding making that call a re-entrant retry (but were that the case, I see nothing to stop this from becoming, under-tail-call-optimization, an infinite loop; and it seems contradicted by the call of completion also deeper at [end](https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalMessaging/Notifications/UserNotificationsPresenter.swift#L284) of the happy path ## redundancy "should" be impossible Taking a gander at `replaceNotification`: https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalMessaging/Notifications/UserNotificationsPresenter.swift#L388-L396 we are given something of a false assurance by the name `cancelSync`, a function which
proceeds for a hefty-ish 66 lines, https://github.com/signalapp/Signal-iOS/blob/e6ba17aa451b6fa66ccc492b078a0d5d41e6e8d6/SignalMessaging/Notifications/UserNotificationsPresenter.swift#L454-L520

... but as best as I've noticed is up until the last 2 lines, reordered in this PR, simply staking out early returns for various edge cases, which reifying an edit is not

meanwhile the simpler function, cancelNotificationSync

which is:

  • only these two lines of body
  • after a log ... which is, of note, Logger.warn to cancelSync's Logger.info
  • immediately follows cancelSync in the file
  • is also flipped over in this PR

seems to only be called here to nope out of notify() in case of replacingIdentifier - which in turn happens only in 2 cases:

  • presentIncomingCall
  • presentMissedCall ... I can't say I understand exactly what that path is, but its not clear it needs its own abbreviated entry point directly to these 2 lines ... but it has it

is proceeded with a reassuring comment // This method is thread-safe.

...except for the lies (asynchronous system API)

So, I think it's clear that these functions basically claim and their callers expect that by return from cancelSync the notification has been canceled.

But these are synchronous only so far as Signal has control of the situation.

Take a look at the docs for the functions and you'll find:

It's thus clear these are very much not synchronous removals

...and the blind spot (data race)

While it would not shock me if in some cases Apple simply decides that a requested removal is not actually very important (to the end at least of letting any error stand without retry, and definitely no (facilitation for) notification of the caller; but perhaps also like - battery save state, CPU load etc),

if that is not the case (and apple will be very diligent) this mismatch of cancelSync( named code and async-proceeding reification shouldn't actually ever result in redundant notifications like my friend reported

but with the current order of calls, it does seem plausible that you could slip through the situation where:

  • when you first call removeDeliveredNotifications, that there "The method executes asynchronously, returning immediately and removing the specified notifications on a background thread." still-so-far accurately describes the situation (cause in fact the notification is still pending, never delivered)
  • but as/by-the-time you second call removePendingNotificationRequests, its now the case that "the trigger condition for that request has already been met, [so] this method ignores the identifier" (that you are now talking about a delivered notification, never canceled)

That seems... like a tight timing constraint to come out that way, so I'm kinda still inclined to think maybe sometimes Apple de-prioritizes such a retraction to filth, but ... as mentioned top level, the batch-processing scenarios (e.g. upon network reconnect with multiple pending edits to same message) seem at least plausible

but maybe only if you busywait?

If in fact it's all about the data race, then this (grossly over-explained) swapping of lines ought to suffice to eliminate it. And I don't think it could hurt regardless (though again, a more experienced iOS programmer may disagree for reasons yet obscure)

But if the "lazy Apple" scenario is on the table, or if indeed you just really want to deliver on the Sync promise of the cancel function names (and the past tense of the did in didReplaceNotification), then I think you need to actually check that the notification is gone now

as a busy-wait loop over (ideally 1 but possibly multiple) calls to getDeliveredNotifications(completionHandler: ) and an O(N) traversal of the present set ; ditto maybe getPendingNotificationRequests(completionHandler:) (or you can perhaps wait for it to be first delivered, since I don't think Signal is in the habit of scheduling in the later-than-ASAP future?)

This feels ... wasteful, so I understand why you don't already do it (beyond possibly incorrectly assuming its unneeded)

but maybe there are other options?

falling out of the bottom of the <details>

a better way?

by the time you've reached cancelSync, you already did getNotificationsRequests which did exactly one turn each of those O(N) loops over getDeliveredNotifications(completionHandler: ) and getPendingNotificationRequests(completionHandler:)

you are nominally holding a UNNotificationRequest, which might be for a genuinely pending notification, or may have been .requested off of a bonafide UNNotification briefly held in passing

either way, you proceed to (peek at .content.userInfo to differentiate edge cases, but by the time we reach an attempted cancel), reduce down to the NSString that is .identifier

targeted observation?

you held a reference to the actual notification (request) object.

Idk if in modern iOS this is a pass-by-value clone or literally a pass-by-reference handle/delegate on the notification as it exists presented in Notification Center

broadly my impression of Apple's approach in the past is that it might be the latter? (though conformance to `NSCopying may say no?)

So is there something in that that is observable? Does anything happen to it if you had a handle and the async piece of the remove*(withIdentifiers: ) have completed? Perhaps (but perhaps not):

These sorts of things could at least avoid repeated O(N) fetch for a "make sure its actually gone"; also could potentially duck any busy waiting if there's something there (or indeed up on the UNUserNotificationCenter for NSKeyValueObserving to target more actively?)

differentiated handling?

.trigger

The wording is confusing, but it would appear possibly that the .trigger of the request coming into cancelSync actually should tell you whether you need to call removeDeliveredNotifications OR removePendingNotificationRequests about it, yes?

per that page's

For notifications that the system has delivered, use this property to determine what caused the delivery to occur.

Idk if that means it actually changes or just you know ... is still there

But also regardless the flattening between the two happening in getNotificationsRequests doesn't have to happen per se

.content

As re my specific concern about excess, and indeed potentially unnecessary re-delivery for edits, there is definitely a chance to look into the body of the notification to see whether the edit actually changed it? (presuming either that there is actually any truncation of what's given there, or a windowing heuristic is applicable?)

actual modification?

not for nothing, and I'm sure there are relevant limitations and/or overhead but I do notice that UNMutableNotificationContent is a thing

as well as this whole system(s): https://developer.apple.com/documentation/usernotifications/modifying-content-in-newly-delivered-notifications / https://developer.apple.com/documentation/usernotifications/unnotificationcontent/updating(from:)

so it seems possible that in place modification is not out of the question?

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 month ago

This issue has been closed due to inactivity.