pimutils / vdirsyncer

📇 Synchronize calendars and contacts.
https://vdirsyncer.pimutils.org/
Other
1.54k stars 160 forks source link

Office365 "multiple items with the same UID" gone insane #1140

Open SethRobertson opened 4 days ago

SethRobertson commented 4 days ago

vdirsyncer 0.19.3, python 3.12.4, Fedora 39 syncing to davmail (and Office365 through that).

I started having sync problems and eventually tracked it down to vdirsyncer aborting due to "multiple items with the same UID". I've had this issue before, so I added some debugging (#1093 -- thanks older me) and started doing what worked previously. Finding a few appointments and then deleting them and hope the problem went away. Well, this time after I got past 5 invites I got suspicious and added even more debugging and discovered I had literally hundreds of duplicates. Specifically, one invite in particular had over 700 identical duplicates. What is worse, I had actually deleted that calendar entry in outlook but I still see those hundreds of duplicates.

When I say duplicate, I mean the HREF is unique (a few changes, the prefix is identical), the etag is unique, but the raw ical entry downloaded from the server was byte identical.

Example of HREFs and etags that had identical contents.

AAMkAGYwNDNmYjk1LTQ4ODktNDRmMy1hNGM0LTEzM2FjYjU3ZmYwNgBGAAAAAABWlPtQw0aTQJFI1LryHRvkBwDhc-N4NPAvQ66EVB5W80vRAAAAAAENAADhc-N4NPAvQ66EVB5W80vRAAALCYyWAAA%3D.EML etag='2024-07-15T17:06:26Z

AAMkAGYwNDNmYjk1LTQ4ODktNDRmMy1hNGM0LTEzM2FjYjU3ZmYwNgBGAAAAAABWlPtQw0aTQJFI1LryHRvkBwDhc-N4NPAvQ66EVB5W80vRAAAAAAENAADhc-N4NPAvQ66EVB5W80vRAAA4LoeYAAA%3D.EML etag='2024- 03-27T17:31:24Z'

AAMkAGYwNDNmYjk1LTQ4ODktNDRmMy1hNGM0LTEzM2FjYjU3ZmYwNgBGAAAAAABWlPtQw0aTQJFI1LryHRvkBwDhc-N4NPAvQ66EVB5W80vRAAAAAAENAADhc-N4NPAvQ66EVB5W80vRAAA4LoeZAAA%3D.EML etag='2023-05-18T14:25:12Z'

Where the actual event was from 2022: "TZID="America/Denver":20220915T140000" The UID for all three (as well as the rest of the >700 dups) was: 040000008200E00074C5B7101A82E0080000000070E7D965D6C6D801000000000000000010000000881AAF884849F149888564DE11934E37

The original UID from the original ics invite was as follows (but I'm not sure what this tells me):

040000008200E00074C5B7101A82E0080000000080EC2F28D6C6D801000000000000000010000000630892359436814C9D679F4B48BB1868

The calendar entry that has hundreds of dups is back from 2022. I very much doubt I got an update for it, so it is probably Office365 being broken unless this is something vdirsyncer did (I know which one I find more likely).

I tried a repair as per the suggestion, but while repair complained about the formatting of some UIDs, it didn't do anything or complain about the duplicate UIDs. I then implemented a hammer and just had the system pick a winner (newest etag seemed as good as any) and delete the loser. I am quite sure I didn't do this in a very clean way.

My request? Automate/support this, maybe in repair, maybe make it safer by checking to ensure that the invites are actually identical, and add a no-execute mode to repair so that you can see what it is going to do before it does it, so that you don't potentially destroy a calendar source's data.

I attached the diff I used to this issue. It is absolutely not a production patch. It is something I used to fix my problem and then to be reverted..at least until the next time this pops up. From the timing of my previous ticket on this issue, it looks like maybe 11 months from now.... :-(

nukem.txt

WhyNotHugo commented 4 days ago

I've taken note to keep this in mind when implementing repair for the v2 implementation. It should be doable to find duplicate entries and keep just one.

It sounds to me like you have this specific situation under control. If you ever need to manually delete entries from a caldav server, you can also use davcli.