thunderbird / import-export-tools-ng

Import Export Tools that supports Thunderbird v68-v128
Other
294 stars 31 forks source link

v14 - Mbox Import/Export Testing and Feedback #432

Open cleidigh opened 1 year ago

cleidigh commented 1 year ago

This thread is focused on mbox issues for the new v14 implementation.

@LamKarThess We can continue here...

First goal getting 50 email folder that exports 50, but re-imports with far fewer. Then we will do inspection. @cleidigh

cleidigh commented 1 year ago

@mbravidor A picture is worth a thousand words! The output of the converter is wrong. Every folder has an mbox even if empty. The top directory does not have empty mboxes for ten and 50. The fact that they have the.sbd folders does not match Thunderbird.

This is surprising as Thunderbird structure has been this way forever. Perhaps the converter has some options for this?

Manual fix is to add empty files. @cleidigh

CeliaMuriel commented 1 year ago

I have lost emails in one of my Ubuntu laptops when importing mbox with structure and subfolders, but I can't tell how many, in what folders and what may have caused me to lose them (older than a date? Number of emails in a folder?). I have a very busy Microsoft account on that laptop. I have a local folder with many subfolders, which also have nested subfolders. I didn't expect I would lose email, and I didn't keep track. Now it's too late.

cleidigh commented 1 year ago

@CeliaMuriel I don't like to hear about lost email! Don't you still have your mboxes for import? I would like to see what we can look at. Are you able to debug with me? @cleidigh

CeliaMuriel commented 1 year ago

@cleidigh , I'm afraid I don't have the old mbox. I used it 3-4 weeks ago. I keep an mbox for one week. Then I take a new one as a backup and delete the old one.

I just noticed over the weeks, after restoring the mbox, that I needed old emails a couple of times and I couldn't find them. I looked for them hard.

However, I kept the old subfolder structure, and I've continued storing emails. If you want, I can restore my current mbox in another instance, and check if I have everything. With the new functionality that shows the total number of emails per folder, it should be easier to detect if anything is missing.

If the test doesn't show any lost email, I can wait a month to accumulate more mails, and try again. In case it's a matter of a very large volume. I had emails since May 2022 in the mbox that I restore when I lost mails.

cleidigh commented 1 year ago

@CeliaMuriel Well I feel badly, obviously being able to rely on the tool is critical. So did you just have random emails missing, but the structure complete? One thing is that the Thunderbird global search has issues. The quick filter on a single folder is reliable. Please let me know if you want to analyze further. @cleidigh

CeliaMuriel commented 1 year ago

Well, I don't know if they were random emails, or if they followed a pattern(older than a date, or a certain number ofemails per folder). I've just needed old emails a couple of times, and I couldn't find them. I know Thunderbird's search function may fail, so I tried several things. I found one email I was looking for because I had replied on it a few hours after writing it, but not the first email. I'm always on a rush, so I didn't make any attempt to find a pattern on the missing emails.

As for the subfolder structure, yes, it was the same. I checked it after the restore.

Let me do the test as I told you, and I'll be back to you with the result.

-------- Original Message -------- On 6 Oct 2023, 20:47, Christopher Leidigh < @.***> wrote:

@.CeliaMuriel Well I feel badly, obviously being able to rely on the tool is critical. So did you just have random emails missing, but the structure complete? One thing is that the Thunderbird global search has issues. The quick filter on a single folder is reliable. Please let me know if you want to analyze further. @.cleidigh

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.AJSNORQ5LGUNYM5UY53YEVLX6BG25A5CNFSM6AAAAAA4GNCJOKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTIMIOK6.gifMessage ID: @.***>

Pirx485 commented 1 year ago

@cleidigh Unfortunately I have still issues. I did a clean reinstall of the addon through the addon tab and started migrating from Evolution to Thunderbird, but there are still mbox-Files that get not imported completely. I have taken the liberty of sending you one of these files directly to the same address as last time.

cleidigh commented 1 year ago

@Pirx485 Thanks, sorry this is tricky. Will do my best to analyze. @cleidigh

Pirx485 commented 1 year ago

No problem, I appreciate your efforts :-)

cleidigh commented 1 year ago

@Pirx485 Alright... This was easy given your file. The mbox actually contains 39 messages (not sure where you got 55). The 22 you (and I) see is correct because 17 are marked with X-Mozilla-Status: 0009 This is Read and expunged (deleted, not compacted) When these are imported they don't show.

@cleidigh

Pirx485 commented 1 year ago

This is getting weird. I can see 39 mails in Evolution, they are not deleted. I suspect that they are probably a remnant from my migration Thunderbird>Evolution a couple years back. At least I know what to look for now, thanks.

cleidigh commented 1 year ago

@Pirx485 The original export from Thunderbird would be the origin of the Mozilla-Status. At the time of export those 17 messages had been deleted. When you imported, Evolution obviously ignored the Mozilla headers and so you saw the 39 not realizing they were marked deleted. When you reimported to Thunderbird they became invisible.

@cleidigh

cleidigh commented 1 year ago

All

I have a very important modification for mbox import in beta 3.

Basically it deals with From_ escaping across buffer boundaries, a tricky thing I have been working on for a few weeks and finally feel it's solid. There is the possibility of several messages per GB roughly that could get munged together.

If people can do some import comparisons on their larger mbox files that would be very helpful!

If all looks good I can get v14.0.1 out.

@cleidigh

xtedx commented 9 months ago

hello, i noticed today that the mbox export file is not compatible with linux anymore. after reading this thread here, i found out that you have made major changes to the mbox procedure. this could have introduced a new bug.

now using version 14.0.1, it auto updates so i don't know what version was working before it breaks but it has been at least 6 months since i last exported to mbox.

i used to export spam and ham folders to mbox and use it to train spam-assassin on a linux server. it worked fine before. i just export from windows thunderbird, upload to linux and run the sa-learn command and all went well. but today it spamassassin couldn't parse the mbox and only read 0 messages. tried to use 'mutt -f ham.mbox' to test if the format is ok and it showed 0 as well.

then i found out a script that someone wrote in 2021 at linuxquestions to convert and that solved the problem, although some emails still could not be read due to date formats (3 out of 28 emails). a bit strange because i'm sure in 2022-2023 i did the export and used spam-assassin with no issues.

what i noticed is the From of linux in my ~/mbox file has the year at the end like "Wed Dec 6 23:14:04 2023", but yours has it in the dd mmm yyyy format "Thu, 25 Jan 2024 23:48:51" if you have time, maybe you can try to fix it. i can to give you the exported mbox of my junk mail.

thank you for the great addon

cleidigh commented 9 months ago

@xtedx Thanks for the post and kudos %-) You are correct, I did a 100% rewrite of the mbox import export code. Part of the impetus was to improve the standardization of things just like this.

I read the post you linked and it's partially correct. In the past ImportExportTools actually did a blind copy out of the Thunderbird mbox files. So any mention of formats were those from Thunderbird not IETNG. If you look at Thunderbird mbox files over the years you will see formatting for the From_ Separator is all over the place. Some are just From -

So with v14.0.1 I wanted to standardize on RFC4155. Unfortunately the time function used actually does not match ctime format despite the implication. There are a lot of variations out there as mbox files are not defined as an IETF standard, just recommendations. That said, I think the format From x@y.com Tue, Feb 20 10:00:00 2023 Is the most prevalent NOT my current format error.

I went through part of this with @lizfischer here https://github.com/thunderbird/import-export-tools-ng/issues/455

Most readers really don't process the From line other than as a separator so I'm surprised SpamAssassin is picky. Regardless, since Liz had reader issues as well, I reached out to see what she thinks.

I think she won't have an issue, and if not I should tweak to match ctime format. @cleidigh

xtedx commented 9 months ago

oh wow thanks for the explanation, i'm surprised thunderbird changed their format many times and didn't realise there is no actual standard in the mbox format. i think we should follow the most prevalent format, since linux default mail from mailutils, mutt, spam-assassin all use the "From x@y.com Tue, Feb 20 10:00:00 2023" format then it would be safe to follow that. an mbox 'man page' also says the format should have the year at the end. i think this is a unix man page. but another mbox man page for linux also says the same format. https://www.commandlinux.com/man-page/man5/mbox.5.html . i'm pretty sure @lizfischer wont have problem too because google is using the same ctime format.