thunderbird / import-export-tools-ng

Import Export Tools that supports Thunderbird v68-v128
Other
284 stars 31 forks source link

Change From dates in separator lines on MBOX export #455

Open lizfischer opened 11 months ago

lizfischer commented 11 months ago

This is a feature request. Currently, Thunderbird puts the date it synced the email in the "From - " line at the start of a message. This gets interpreted by many MBOX viewers as the date received, which technically in the MBOX spec it should be, but it's a bit misleading. Other Thunderbird export tools, like Emailchemy, fix this problem by changing the "From - " line during export to include the date listed in the "Date:" field of the message header. It would be a huge help if ImportExportToolsNG had an option for this when exporting folders.

lizfischer commented 11 months ago

I see that this is not an issue in the new version! apologies

cleidigh commented 11 months ago

@lizfischer Actually that was going to be my first question, what version. Thunderbird has and still does have some total inconsistent From behavior. Up to 102 IETNG just copied out mbox files as is. V14.0.0 builds an mbox within IETNG. I now create the generally accepted format UNLESS a From hdr exists, I may try to change that. Right now I have bigger problems trying to handle mbox files over 4GB with an api that cannot work over 4GB! @cleidigh

lizfischer commented 11 months ago

I was working with the older version for 102. Just installed the new version and the dates align with my expectation now. I am having issues with this account-level export not capturing all folders on the new version, unfortunately:

image

Edit: Also, I can't tell if there's an issue for this already & maybe I should open a new one, but it would be awesome if account-level export could add the .mbox extension to the files it creates. I've been doing it manually to make it interpretable to my mbox viewer

Edit 2: I see the structure issue flagged in https://github.com/thundernest/import-export-tools-ng/issues/432#issuecomment-1747365942 ! 😄

cleidigh commented 11 months ago

@lizfischer The account export is partially messed up, I listed that in the release post. It doesn't have the container, but the folders in the sbd should export correctly. What are you seeing? I have to do work to deal with the mbox extension for structured exports and imports. Exports are easy, imports not as easy. This will come later. @cleidigh

lizfischer commented 11 months ago

In the screenshot I posted above, I expected that all the folders shown in Thunderbird (Inbox, Drafts, Sent, Nesting folder child 1, etc.) would have corresponding files in the sbd folder. I made a new TB profile & re-synced and that fixed the issue, though. Must have been some quirk with upgrading from 102 to 115. I'm seeing all of them now (compare below with previous screenshot)

image

Adding the .mbox extension manually used to work to make the files viewable in e.g. PST Viewer Pro, but that doesn't seem to be the case anymore

cleidigh commented 11 months ago

@lizfischer I don't know why the viewer wont work, the files are the same with the exception of the more standard From. @cleidigh

lizfischer commented 11 months ago

Yeah, seems like a parsing issue on their end. Weird that it like the worse version of the From line 😅 Anyway, thanks for your help!! And thanks for your work on this plugin, it's great. I'm testing it for use in the acquisition of donor's email by the manuscripts division of a library

lizfischer commented 11 months ago

Ah, I figured it out--the viewer isn't expecting it to be "From - foo@bar.com Date Time Timezone". It expects something adhering more closely to the RFC spec for MBOX:

Each message in the mbox database MUST be immediately preceded by a single separator line, which MUST conform to the following syntax:

  • The exact character sequence of “From”;
  • a single Space character (0x20);
  • the email address of the message sender (as obtained from the message envelope or other authoritative source), conformant with the “addr-spec” syntax from RFC 2822;
  • a single Space character;
  • a timestamp indicating the UTC date and time when the message was originally received, conformant with the syntax of the traditional UNIX ‘ctime’ output sans timezone (note that the use of UTC precludes the need for a timezone indicator);
  • an end-of-line marker.

Removing the dash, extra spaces, and timezone from an IETNG-exported file fixes the issue. For example, changing From - foo@bar.com Tue Sep 26 2023 07:33:16 GMT-0700 to From foo@bar.com Tue Sep 26 2023 14:33:16

Not sure who is more correct here, IETNG in its output or the viewer in its insistence on format.

Edit: fwiw, .mboxes generated by Google Takeout use this: From 1772327484425937763@xxx Mon Jul 24 18:26:42 +0000 2023 and those generated by Emailchemy use: From - Thu Oct 05 08:52:05 2023, where (I'd guess) the dash is standing in for a missing email address

cleidigh commented 11 months ago

@lizfischer I think that the converter is ridiculous in its requirement for a non functional separator. I have seen many more formats than you mentioned. I don't have an issue with dropping the dash as this is an artifact from Thunderbird and it's not required on the import side. I don't like dropping the timezone though. @cleidigh

lizfischer commented 11 months ago

From foo@bar.com Tue Sep 26 2023 07:33:16-0700 would be closer to the spec & works in the viewers I've been testing. I don't think you have to drop the timezone, but I do think reformatting to be closer inline with the MBOX file format specifications is important. So in addition to a slight change to timezone presentation, dropping the dash and duplicate spaces.

cleidigh commented 11 months ago

@lizfischer I agree this is the "smart" thing to do especially since I am abandoning the non-conformant TB artifacts. I am coordinating with the Thunderbird mbox developer so hopefully we can get on the same page. So long as TB is tolerant, which it should be, I can do the ctime format. @cleidigh

lizfischer commented 11 months ago

@cleidigh Thanks, I'll check it out

lizfischer commented 11 months ago

Hi @cleidigh—I installed the preview you sent, v14.0.1-b1-fdt1, & am still getting the From lines like From - no-reply@cc.yahoo-inc.com Mon Oct 16 2023 16:18:56 GMT-0700 rather than From no-reply@cc.yahoo-inc.com Mon Oct 16 2023 23:18:56-0700

cleidigh commented 11 months ago

@lizfischer First you should update to my first beta v14.0.1-b1 not that anything has changed. Look at the message source of one of the messages. I suspect this is Thunderbird incorrectly returning a From separator for an individual message. This is one of several Thunderbird issues I am discussing with the developers. I will somehow address in v14.0.1, the upcoming maintenance release. Can you id the source, imap, archive or Local folder? @cleidigh

lizfischer commented 11 months ago

Re-installed from the link in #466 and it's looking good—not sure why different from the version the other day, but glad it's working! Thanks

cleidigh commented 11 months ago

@lizfischer Depends on msg, source and Thunderbird's mood. I am assuming TB bad and will deal with it in b2 which should push for tomorrow. All betas here:

cleidigh commented 11 months ago

@lizfischer Try b2. Usually it's local folders that had issues. All replaced now. @cleidigh

lizfischer commented 11 months ago

Working well! My team sends their thanks, too

cleidigh commented 11 months ago

@lizfischer Great, one for the team... Are you doing mbox imports? I ask because I am close to releasing b2 with an important mbox import patch for an edge condition and want testers. @cleidigh

lizfischer commented 11 months ago

We're not doing any imports, just using the exports for archiving purposes.

cleidigh commented 11 months ago

Thx @cleidigh

cleidigh commented 7 months ago

@lizfischer Hope all is well... Wanted to reopen this discussion as I think we ended up with a less than standard date format. I have another user that is having trouble with the current format on Linux. I think with all the back and forth I missed the reference to Unix ctime format. This puts the year at the end and appears to be more common. I would assume this change would not affect you since I believe Google format has worked for you.

Any thoughts if I change? @cleidigh