joeyates / imap-backup

Backup and Migrate IMAP Email Accounts
MIT License
1.37k stars 75 forks source link

Limit / backup only last recent emails #107

Closed vielhuber closed 2 years ago

vielhuber commented 2 years ago

Hello!

Would it be possible to add a flag that only the emails of the last N days are backuped?

We have very large mailboxes and suffer from too small caches, too long upload times etc.

If imap-backup would backup e.g. only mails from the last N days, we could overcome those issues.

joeyates commented 2 years ago

Hi @vielhuber

Thanks for opening the issue!

I have two doubts about this functionality.

Problems

Usability

Firstly, in terms of the library, adding functionality to do partial backups is a delicate matter, and there may be a million different use cases in this area for different users.

Implementation

Principally, I think that technically, this may be difficult...

Due to the current implementation, the application does not know the dates of emails it hasn't downloaded. So, to decide where the first email of "today - N" is, it would have to scan through all remote emails, until it found the first one. It couldn't rely on checking the emails it's already downloaded, as this would fail if backups are not done for N days.

In order to get over this problem, one would probably have to implement something like a bisection search over metadata, which is a major job.

Possible Solutions

Use Case

If I understand your use case correctly, you have mailboxes that are very big before you start running backups.

So, you want to start by ignoring most (or all) of the previous emails, and just start from the most recent ones.

A Hack

If the above describes your needs, you could get a list of email ids (UIDs) before the first backup and load them into the ".imap" file that imap-backup uses to decide what to backup. That way only successive emails would get backed up.

Here's a possible implementation for a single folder, YMMV!

#!/usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "imap-backup", git: "https://github.com/joeyates/imap-backup"
end

email = ARGV[0] or raise "Please supply an email"
folder_name = ARGV[1] or raise "Please supply a folder"

connections = Imap::Backup::Configuration::List.new
account = connections.accounts.find { |a| a[:username] == email }
raise "#{email} is not a configured account" if !account
connection = Imap::Backup::Account::Connection.new(account)
folder = Imap::Backup::Account::Folder.new(connection, folder_name)
serializer = Imap::Backup::Serializer::Mbox.new(connection.local_path, folder_name)
uids = folder.uids - serializer.uids

serializer.apply_uid_validity(folder.uid_validity)

uids.each do |uid|
  message = <<~MESSAGE
    From: fake@email.com
    Subject: Message #{uid} not backed up
    Skipped #{uid}"
  MESSAGE
  serializer.save(uid, message)
end

Usage:

$ ./stuff-uids.rb EMAIL FOLDER

A New Command

An "ignore everything before today" command like the above could be added to imap-backup, but, again, we would need to consider how widely useful this is.

vielhuber commented 2 years ago

Thank you for this detailed and precise answer.

I have successfully tested the solution "stuff-uids.rb", which works as intended.

Unfortunately, what is still missing for a productive use, is the possibility that the folder names are determined automatically (each account has a different folder structure).

Is it possible to add this in the script or give me a hint how to achieve this?

In general, I share your concerns and challenges with this extension, but I would like to name one more use case:

With many backup solutions (even those that are incremental), it makes more sense to have multiple small files than one very large file. So my plan would be:

This way you have a weekly archive with small files that you can backup very well.

So also the problem would be solved, if accidentally deleted e-mails, I can restore them afterwards very easily.

joeyates commented 2 years ago

@vielhuber

In the program, you would have to put the existing code in a loop, for each folder:

#!/usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "imap-backup", git: "https://github.com/joeyates/imap-backup"
end

email = ARGV[0] or raise "Please supply an email"

def fill_folder_with_dummy_messages(connection, folder_name)
  folder = Imap::Backup::Account::Folder.new(connection, folder_name)
  return if !folder.exist?
  Imap::Backup.logger.info "Folder '#{folder_name}'"
  serializer = Imap::Backup::Serializer::Mbox.new(connection.local_path, folder_name)
  uids = folder.uids - serializer.uids
  Imap::Backup.logger.info "#{uids.length} messages"

  serializer.apply_uid_validity(folder.uid_validity)

  uids.each do |uid|
    message = <<~MESSAGE
      From: fake@email.com
      Subject: Message #{uid} not backed up
      Skipped #{uid}
    MESSAGE
    serializer.save(uid, message)
  end
end

connections = Imap::Backup::Configuration::List.new
account = connections.accounts.find { |a| a[:username] == email }
raise "#{email} is not a configured account" if !account
connection = Imap::Backup::Account::Connection.new(account)

Imap::Backup.logger.info "Filling local folders for #{email} with dummy messages"

connection.folders.each do |folder_name|
  fill_folder_with_dummy_messages(connection, folder_name)
end

On the other hand, instead of modifying the code, you could get the list of folders via the command line, then call the existing script for each folder.

You can list all the folders for an account like this:

$ imap-backup folders --accounts EMAIL

It outputs the email, followed by all folders.

vielhuber commented 2 years ago

Thank you very much, this works very well and the concept I think is quite flexible. Perhaps you could consider adding it into core.

Just one enhancement for the above script (I needed this):

Line 7: Before: gem "imap-backup", git: "https://github.com/joeyates/imap-backup" After: gem "imap-backup", git: "https://github.com/joeyates/imap-backup", branch: "main"

vielhuber commented 2 years ago

Since branch v4.0.7 (and even v4.0.6) I get an error running the above script:

stuff-uids.rb:32:in `<main>': uninitialized constant Imap::Backup::Configuration::List (NameError)
joeyates commented 2 years ago

Hi @vielhuber

The code around Configurations and Accounts has changed over the last few versions.

The functionality you need should be available now via the ignore-history command:

$ imap-backup utils ignore-history EMAIL
vielhuber commented 2 years ago

Awesome, seems to work.

One note after the last update to 4.0.7:

I get this notice when running any command:

/var/lib/gems/2.7.0/gems/thor-1.1.0/lib/thor/error.rb:105: warning: constant DidYouMean::SPELL_CHECKERS is deprecated
Calling `DidYouMean::SPELL_CHECKERS.merge!(error_name => spell_checker)' has been deprecated. Please call `DidYouMean.correct_error(error_name, spell_checker)' instead.

Perhaps you could have a look at that.

vielhuber commented 2 years ago

Unfortunately, it does not work: It just creates an empty folder but without any mbox files in it.

imap-backup utils ignore-history my@email.com

I get a warning (but this should have nothing to do with the problem, since I'm getting this warning on every call:

Calling `DidYouMean::SPELL_CHECKERS.merge!(error_name => spell_checker)' has been deprecated. Please call `DidYouMean.correct_error(error_name, spell_checker)' instead.

I also updated to ruby 3.0.0.

Do you have any suggestions?

joeyates commented 2 years ago

Hi @vielhuber

I've pushed a bugfix for your ignore-history problem as version 4.1.1

From a quick search, I think the DidYouMean::SPELL_CHECKERS.merge!(error_name => spell_checker) warning relates to a problem with Bundler.

vielhuber commented 2 years ago

Thank you very much. 4.1.1 works without any problems and also without the SPELL_CHECKERS warning.