postalsys / emailengine

Headless email client
https://emailengine.app/
Other
1.9k stars 169 forks source link

List messages in folders returns incorrect paging data for Gmail accounts #425

Closed brandonaaskov closed 4 months ago

brandonaaskov commented 4 months ago

Describe the bug I'll show a few screenshots fetching for the same account ID to illustrate the error.

pageSize=1000 image Here the paging data is accurate because there is only one page: total: 495

pageSize=100 (you can also remove the param entirely) image 3 pages, 100 per page, but a total of 201?

Using cursor... image On page 3 of 3 but there's still a nextPageCursor value, total is still incorrect at 201

If I use the nextPageCursor for the fourth page, I do get messages back and nextPageCursor is null, but the page and pages values are very wrong, but total is finally correct: image

To Reproduce Steps to reproduce the behavior: You can do this directly against the EmailEngine API with something like Postman. This only appears to affect Gmail accounts.

EmailEngine version 2.43.0

Environment Just running it locally

Redis

Additional context This latest change to EmailEngine broke the ability to use the page value and forces using cursors for Gmail accounts, which is a breaking change and has definitely slowed me down unexpectedly. Thankfully my app is not in production yet. You even list it under "Bug Fixes" in the latest release but "Adding API support" for something is not a bug fix, that's a feature, and a breaking change in this case even though it was a minor release.

andris9 commented 4 months ago

The total count is from Gmail search results, and as with all Google searches, it is an indicational number, not a "true" total results count. As long as there is a page token for the next page you can continue paging, even if the page number count says otherwise. Gmail API has no support at all for numeric paging, you can only use cursors. If you need to use page number-based paging, then you should use the IMAP+SMTP-based connection, not Gmail API.

The cursor based paging is not a breaking change because previously it was not possible to use Gmail API as the email backend. But yes, I need to make this more clear in the API documentation.

brandonaaskov commented 4 months ago

@andris9 I disagree, and maintain it is a breaking change but only because it's the API layer. That's the normalized layer where I don't need to care about the underlying implementation (Outlook, Gmail, IMAP, etc). Just because the Gmail API is being used, it shouldn't change what total means. As it stands, total behaves differently for just this one implementation. So as a user of the API, I have to now fork my code to handle the different meaning of what total is for Gmail accounts.

Plus, even if was coming from search results, there are 100 items returned in the array, so this "indicational number" of 201 literally is indicative of nothing: I have no idea where it's coming from or why it's 201. There are 495 messages in total, 100 per page. And yet I'm seeing that there are 3 pages even though I can fetch 4, and the total is 201 until the last page is fetched and then the number is accurate at 95. This is, 100%, a bug and a breaking change.

From your own blog post, it's pretty evident that the scopes required for the IMAP-based connection to Gmail will only work for "internal" apps.

The minimum permission set requirement is probably the one that will sink your application to get EmailEngine integrated with Gmail accounts. EmailEngine requires access to the highly restricted "https://mail.google.com/" OAuth2 scope. This is the only scope that allows access to IMAP and SMTP – the protocols that EmailEngine uses.

Unfortunately, Google would probably consider that scope too wide for whatever use case you have and ask you to use more restrictive scopes. These restrictive scopes, in theory, give you access to the required data but not to IMAP and thus are unusable by EmailEngine. If you can convince Google that the features you need are only available via IMAP, then you might pass the review. Obviously, there are no guarantees.

The same posts then says "If you can afford it and you are able to weasel yourself through the verification process, go with the public OAuth2." It's clear, and I appreciate it, that this option is a non-starter for anything besides an internal-only use-case.

You even say in this post "However, IMAP and SMTP are not always the most suitable options. Recognizing this, we have been working on adding additional email backends to EmailEngine", which again, I appreciate. But I still think the API should return the right value for total and pages because right now they're wrong.

Fwiw I am a paying customer, and I'm a little miffed that you dismissed this issue so easily without even reading it because if you had, you would have seen the error with pages being incorrect and total being incorrect until the last page is fetched.

andris9 commented 4 months ago

Unfortunately, it is not possible to get the actual message count from Gmail API. EmailEngine asks from the API "how many emails match this query?" and the server responds with "201, maybe, who knows 🤷‍♂️". So the correct option would probably be to remove the pages and total values from the output when using the API backend because these are almost never correct.

brandonaaskov commented 4 months ago

This is only a concern for me during the on-boarding phase of my app, and in my case all I've really lost is the ability to show a progress bar (when I know the total) instead of a spinner. Not the end of the world for my use case.

But I'm curious @andris9, why is the total value always correct on the last page for these Gmail API accounts? The last page's total is always correct, even if pageSize is so large it returns everything in the first page.

andris9 commented 4 months ago

The total value is provided by Gmail. EmailEngine has no way at all to know what or why that value means. It’s a magnitude approximate. EmailEngine uses the total value to calculate the pages value, and if the total is wrong, then the pages value is wrong as well. The only trustable value is nextPageCursor - as long as it is set, you can continue paging the results.

brandonaaskov commented 4 months ago

@andris9 I was previously building my own Gmail and Outlook integrations before finding EmailEngine. Here's a very basic test that can be run by anyone to verify this issue on the Gmail side:

import 'dotenv/config'
import { gmail_v1, google } from 'googleapis'
import { OAuth2Client } from 'googleapis-common'

async function example(accessToken: string, page?: string) {
  const auth: OAuth2Client = new google.auth.OAuth2(
    process.env.GMAIL_CLIENT_ID,
    process.env.GMAIL_CLIENT_SECRET,
    `http://localhost:3000/oauth`
  )

  auth.setCredentials({ access_token: accessToken })
  const gmail: gmail_v1.Gmail = google.gmail({ version: 'v1', auth })

  do {
    const response = await gmail.users.messages.list({
      userId: 'me',
      q: 'in:sent',
      maxResults: 500, // "typical" max limit for gmail
      pageToken: page,
    })

    // ...do stuff with page and/or messages
    console.log(response.data.resultSizeEstimate)

    page = response.data?.nextPageToken
  } while (page)
}

const accessToken = // you can get it from /v1/account/{account}/oauth-token
example(accessToken)

The only time it gets it right is when the total results is less than maxResults. @andris9 you would know better than me, where is the best place for me to file this issue with Google (or at least back you up if you've already raised the issue)?

andris9 commented 4 months ago

It is not a bug on Google's side. The value is called an estimate (resultSizeEstimate), so it does not have to be exact. They can probably not return you the actual result size, as calculating it would be too slow or resource intensive, and instead use some kind of estimation algorithm to come up with the number.

brandonaaskov commented 4 months ago

@andris9 that's fair, I can see why they cleverly named it that way. But let's just say there was a guy out there that had a loud mouth and wanted to convey to Google that this should be fixed. Where would be the best place for a guy like that go directly and maybe be heard?