Closed arinwt closed 4 years ago
From Vlad's email
SHA-1 has interesting time-distribution (below). It comes and goes.
There a lot of docs affected, but over last day it’s mostly this one: JYIb4ycQBLFoejodz4rs9JPqKeNvGydrGQrr7%2BlfVb8%3D
This query looks into a session that generated all of the events for this doc:
union kind=outer database("Office Fluid").*Runtime*_*
| where Session_Id == "65c705a6-373f-4de7-a6f0-b32fd569b922"
| project Event_Time, Event_Sequence, Data_eventName, Data_error, Data_reason, Data_clientType
It hits SHA-1 in rapid succession, but then summarizer hits other problems (asserts, nacks from PUSH), so I doubt this doc ever summarized. What looks weird (feels like a bug, even though code should work just fine) is that summarizer reconnects as read-only.
If I had to guess, I’d think it has something to do with us using crypto in (see hashFile()). @arinwt, does safe summary path avoids dedup code in driver? If not, should there be some indication to a driver to invoke safest path if we are choosing recovery path on client side? Feels like no matter what’s the root cause of it, we should be able to recover.
@vladsud - right now there is no indication to the driver that it should use safest path possible.
Maybe there should be one? :) Or maybe there should not be, but the driver should self-adjust, i.e. if it is hitting an exception in blob reuse code path, it should ignore and move forward? Clearly we do not know where the error happens, but I think we can guess with good probability and maybe get us out of it (logging error to know it happened, but I think counts would be low so likely we will not look into details). What do you think?
The cause for this issue is that Edge 18.x versions don't support SHA-1 (Edge Chromium seems to work). I found this blog post referencing it: https://docs.microsoft.com/en-us/archive/blogs/yurikasensei/sha1-users-guide (located by this SO question: https://stackoverflow.com/questions/49018153/subtlecrypto-digest-fails-for-sha-1-in-microsoft-edge)
I ran the following query with the following results:
Office_Fluid_FluidRuntime_Error
| where Data_eventName == "fluid:telemetry:Summarizer:Summarizing_cancel"
| where Data_error == "SHA-1"
| summarize count() by Data_tempBrowserInfo
| sort by Data_tempBrowserInfo;
Data_tempBrowserInfo | count_ |
---|---|
Edge 18.19603 | 639 |
Edge 18.19587 | 46 |
Edge 18.18363 | 22609 |
Edge 18.18362 | 1484 |
Edge 18.17763 | 516 |
If we support these versions of Edge(? I don't know) @heliocliu - maybe you can take over this? At least if we can get the stack on these error messages, that would be helpful in the future.
@curtisman Do you know the timeline for edge classic support? In these cases do we normally do a browser runtime check? Also that sha1 deprecation on windows should have just been for cryptographic usage and should still be ok for regular hashing, surprised they actually removed support for it (ran into this before when working on the Office updater)
Can we do something cheap here? Like maybe catch exception and run through old path as a fallback? If we can avoid (adding) Node's buffer dependency, that would be awesome. The other potential solution - catch it at layer below and not do blob de-dupping SPO driver. Not ideal, but better then not summarizing at all.
@heliocliu , could you please help here and get something safe into release/0.15 ? It would be great to have something on Monday. Arin has some number of other critical issues to follow up.
Yeah I'm looking into it
@vladsud release/0.15 shows it's consuming version 0.14 of common-utils (and release/0.16 shows it's consuming 0.16). Does that mean I need to fix this in 0.14 and 0.16 instead of 0.15?
If we don't actually support Edge classic, then maybe we can just ignore these errors and put the fix in master?
https://microsoft.sharepoint-df.com/:w:/t/OfficeOnlinePrague/EfSFFvhOOC9Lm_HdYOOdEfMBP2Crx5hTfdtrukEyAXzbFw?e=psSS82 says we don't support Edge Classic™ and it should eventually show a browser unsupported screen. If this is the case, I don't think we should make fixes to help support something we don't support
This has some impact on Fluid Framework, but I'm not sure it means we do nothing. I.e. I think it has to be handled in some way. That maybe not supporting (and blocking at some level) Edge in Fluid Framework itself. Or handle this case in some reasonable way (fallback). If the fix is cheap, I'd rather fix it.
@curtisman i don't think this meets the hotpatch criteria, as i don't think the impact or severity is high for this issue.
Occurs during summarization.