rgladwell / imap-upload

Python script for uploading a local mbox file to IMAP4 server.
Other
130 stars 30 forks source link

Add documentation on how google handles duplicates #22

Closed johncolby closed 3 years ago

johncolby commented 3 years ago

I was stumped for awhile, thinking the script was not grabbing the dates for my emails correctly, but it was actually an issue with how google handles multiple copies of the same email (apparently it keeps the metadata from the first one for all copies).

What happened was:

  1. Upload an mbox (which just happens to have the wrong datetimes in the default from time field.
  2. Notice that the date is wrong in gmail.
  3. Trash email on gmail.
  4. Try the script again with the correct (for my use case) time-fields=date,received parameter.
  5. Notice on google that message has been re-uploaded, but still has the wrong date!

The issue was that the first/wrong message was still in my Trash on google, and not actually deleted. Soo, when I was trying the re-upload with the correct date, I think the google servers see the duplicate and simply un-trash it (complete with the original incorrect date).

Anyway, maybe you could add a mention of this behavior to your docs? Maybe it'll help other people in who encounter the same situation.

Thanks for the AWESOME work!

rgladwell commented 3 years ago

Thanks for reporting this.

It is an interesting issue. I'm a bit cautious about adding this to the docs without understand what is really going on.

What sort of text were you thinking?

johncolby commented 3 years ago

From googling around a bit, it seems Gmail only stores a single copy of each message, according to its message-ID uuid, and discards any future duplicates on arrival.

Maybe just a sentence to that effect? No worries if it sounds like too much of an edge case to warrant documentation...even just having this issue documented will be helpful to future searches. Feel free to close. Thanks again for this super useful tool! 👍

rgladwell commented 3 years ago

Closed by https://github.com/rgladwell/imap-upload/pull/20. I'm a big fan of DRY and I think this is a nice solution to supplying a list of known issues, with all the relevant information and discussion in one place.