nodemailer / wildduck

Opinionated email server
https://wildduck.email/
European Union Public License 1.2
1.9k stars 265 forks source link

IMAP SORT extension #229

Closed louis-lau closed 4 years ago

louis-lau commented 4 years ago

Heya, I've been experimenting with an imap based webmail (SOGo), and the experience is quite slow with WildDuck.

Because WildDuck doesn't support SORT the entire mailbox must be fetched before it can display it in webmail. With everything running locally on my PC (with admittedly not enough RAM) this takes about 4.7 seconds for a mailbox with 1800 messages. That is a long time waiting for webmail to load.

When I connect SOGo to WildDuck through a Dovecot imapc proxy which does support SORT, it will only fetch the most recent 100 messages. This takes about 1.7 seconds through the proxy. And it would be much faster than 1.7 seconds if it didn't have to go through Dovecot.

I would love to not bother you and just use the Dovecot proxy, but that wouldn't be feasible. Performing a sorted body search makes Dovecot download and index the entire mailbox, which takes about 100 seconds 😞.

Do you think it's possible for WildDuck to support SORT?

andris9 commented 4 years ago

This is possible as SORT is nothing more than SEARCH that already exists but with sorting. I haven't done SORT mostly because sorting mongo results without correct index is not a great idea - if the mailbox has a lot of messages then the query can run out of (capped) memory very fast and instead of sorted result you get an out of memory error. So if indexes are figured out then implementing SORT itself should be fairly easy.

louis-lau commented 4 years ago

Looks like the requirements are as follows:

ARRIVAL
CC
DATE
FROM
REVERSE
SIZE
SUBJECT
TO

As far as I understand all except maybe ARRIVAL should already be indexed for SEARCH.

I've checked, and mongo can simply traverse the index in reverse order for REVERSE:

db.messages.find({mailbox: ObjectId('5e7574f5a2bd6304e096a8bd')}).sort({size: -1}).explain()
...
"winningPlan": {
  "stage": "FETCH",
  "inputStage": {
    "stage": "IXSCAN",
    "keyPattern": {
      "mailbox": 1,
      "size": 1
    },
    "indexName": "by_size",
    "isMultiKey": false,
    "multiKeyPaths": {
      "mailbox": [],
      "size": []
    },
    "isUnique": false,
    "isSparse": false,
    "isPartial": false,
    "indexVersion": 2,
    "direction": "backward",
    "indexBounds": {
      "mailbox": [
        "[ObjectId('5e7574f5a2bd6304e096a8bd'), ObjectId('5e7574f5a2bd6304e096a8bd')]"
      ],
      "size": ["[MaxKey, MinKey]"]
    }
  }
}
...

So the only thing I'm not sure about is ARRIVAL 😁

louis-lau commented 4 years ago

As far as I understand

I guess I didn't understand very far 😁, as search is done against CC, FROM, TO, and SUBJECT using full text search right? I also overlooked the fact that you can SORT by multiple properties, which won't work with the current compound indexes.

So if indexes are figured out

I assume adding a bunch of new indexes won't be an option?

louis-lau commented 4 years ago

Coming back to this, I see that CC, FROM, TO, and SUBJECT should be included in the by_headers index.

Some observations:

db.messages.find({mailbox: ObjectId('5e7574f5a2bd6304e096a8bd')}, {uid:1}).sort({idate: -1, size:-1})

☝️ A sorted find on multiple properties uses no indexes, a lot of ram, and takes a while if the documents are currently not in ram. It takes more ram that the default limit.

db.messages.aggregate([
  {$match: {mailbox:ObjectId("5e7574f5a2bd6304e096a8bd")}},
  {$sort: {idate: -1, size:-1}},
  {$project: {uid:1}}
])

☝️ Using an aggregation uses little ram and returns almost instantly even when the documents are currently not in ram. I believe it is using the indexes, though I'm unsure and explain() didn't really help here.

@andris9 What do you think of sorting through aggregations, have any experience with them?

louis-lau commented 4 years ago

Duplicate of #193