umgc / fall2022

SWEN 670 Fall 2022 cohort
Creative Commons Zero v1.0 Universal
4 stars 3 forks source link

Develop: Add email cache process for quicker search #385

Closed mdconatser closed 1 year ago

mdconatser commented 2 years ago

Depends on #400 for Email object Team A will be depending on this process to add their own notifications process (#425)

As a user of the USPS ID app, I want mail search to be fast, even for finding text in images.

Acceptance Criteria

The cache process should do the following:

The following data needs to be set on the saved MailPiece objects:

This does not include the OCR cache functionality, that will be added by #390

umgcjack commented 2 years ago

I'm working on the Notification task that is dependent on this task. I've done a little brainstorming how we can architect these two features in a way that is independently implementable and testable.

Note: These interfaces are pretty bare, and would likely need to be expanded as we consider other use-cases.

Here's my initial thoughts:

import '../models/MailPiece.dart';

/// The `MailNotifier` class manages OS notifications based on a set
/// of notification criteria generated on the Notification page of the
/// application.
abstract class MailNotifier {
  /// Check whether the piece of mail matches any notifications, and if
  /// so, generate or update the existing OS notification.
  void notify(MailPiece piece);
}

/// The `MailStorage` class saves a piece of mail to the database.
abstract class MailStorage {
  /// The latest timestamp associated with a stored piece of mail.
  /// This should be used to fetch new mail, ensuring mail recieved
  /// before this date is already stored and does not need to get fetched.
  DateTime get lastTimestamp;

  /// Persist a peice of mail to the database.
  /// The return value is whether or not the mail was saved as a new piece.
  /// Saving an already stored piece of mail should either update the existing
  /// item or noop, returning false.
  bool save(MailPiece piece);
}

/// The `MailFetcher` class requests new mail from a mail server.
abstract class MailFetcher {
  /// Fetch all pieces of mail since the provided timestamp
  /// from `uspsinformeddelivery@email.informeddelivery.usps.com`
  ///  with the subject `Your Daily Digest`.
  Stream<MailPiece> fetchMail(DateTime lastTimestamp);
}

/// The `MailProcessor` is the mechanism the application uses to ingest new
/// pieces of mail. This should be configured to run on start-up and scheduled
/// to run periodically.
class MailProcessor {
  final MailNotifier _notifier;
  final MailStorage _storage;
  final MailFetcher _fetcher;

  MailProcessor(this._notifier, this._fetcher, this._storage);

  /// Fetches mail since the last time a piece of mail was received, and then
  /// stores and processes that mail, making it available to the application and
  /// updating notifications.
  void fetchAndProcessLatestMail() {
    _fetcher.fetchMail(_storage.lastTimestamp).listen(_processPiece);
  }

  /// Process an individual piece of mail, storing it and and updating any
  /// notifications.
  void _processPiece(MailPiece piece) {
    if (_storage.save(piece)) {
      _notifier.notify(piece);
    }
  }
}

I think you could implement classes that extend MailFetcher and MailStorage, and I can do the same for MailNotifier and flesh out the MailProcessor (Set it up to run on start-up and schedule it in the background).

This is just an idea, so lemme know what you think!

SauterErin commented 2 years ago

Alright just going to repeat what you said, so stop me if I misunderstood.

  1. MailNotifier as a class would be focused upon Notifications and whether any of the newly cached emails match the notification.
  2. MailProcessor - I actually having difficulty parsing this as the "ingest" makes me think intake and thus MailFletcher's responsibilities.

1 + 2 being your responsibility. 3+4 being mine.

  1. MailFetcher - Class the pulls mail from email server.
  2. MailStorage -Stores latest timestamp to use for fetchingn new mail. Also responsible for saving new mail to the database.

Have I misunderstood?

SauterErin commented 2 years ago

I'm just going to be thinking aloud here as I have not worked in an Android environment before so if I am about to nuke the entire project or utterly offbase please feel free to say something. It is not rude to in fact tell me that I am trying to pound a square peg in a round hole.

Caching - specifically

  1. The cache process runs immediately in the background after logging in to the application.
  2. The cache process should only pull emails that arrived since the last run of this process.
  3. The first time this process runs, it should use 7 days ago to avoid pulling/caching our entire email history -- mainly to avoid - breaking past the free tier of Google Cloud Vision. (this may be temporary)
  4. The timestamp of when this process ran should be saved to the database.

// Cache process method triggered by login // Variable LastSearch //Pull last timestamp from database // If last timestamp is null (or other value set by database) // LastSearch = CurrentTime - 7 days //Else LastSearch = pulled timestamp from database // At this time submit current timestamp to database. // Pull request from email server all emails <LastSearch - Append/use current Email Pulling method // Every Email pulled search for image (Note to self review prior project's documentation as to preexisting methods for locating such) // // If found - // ID of Mailpiece (Again search of prior documentation - otherwise need to find way of generating unique ID's) // Test Sender // Test Description //EmailID - Note to self - see what information there is already for an email about to be parsed. Maybe like an EDI? New Email grab EmailID - set to all images found in loop, new Email -> New Email ID

shaneknows commented 2 years ago

I think you are on the right track, Erin. I believe we are storing this unique ID that should come back from the "enough_mail" call and then whatever results we get from the image conversion when calling to Google Cloud Vision. We may need an extra ID for MailPieceNumber since the USPS informed delivery messages could have multiple images and we will want to parse and identify each one as a separate, searchable item.

umgcjack commented 2 years ago

I agree too! You've def got the right idea, Erin.

Some additional notes:

SauterErin commented 2 years ago

A few thoughts as I go over last team's docuementation

  1. With the database do we have a known format the database will expect the information in?
  2. I have been thinking MailProcessor and MailFetcher as classes/Darts of the project. My increasing inability to find them within Android Studio has me quite concerned that I am incorrect.

@umgcjack I do like your thoughts about using the last recieved time.

Image

As I said I am not well versed in Android so I am all for limiting the coding to allow preexisting systems to pull the information that we then manipulate.

// I am used to JAVA based programming so I'm am most likely treating this very much as a class/method being called. If this is incorrect again correct me now. // Cache process method triggered by login or timer or other trigger // Variable LastRecievedTime //Pull last timestamp from database // If last timestamp is null (or other value set by database) // LastRecievedTime= CurrentTime - 7 days //Else LastRecievedTime= pulled timestamp from database // At this time submit current timestamp to database. // Pull request from email server all emails after LastRecievedTime- Append/use current Email Pulling method // Loop for every email //If Sender = "uspsinformeddelivery@email.informeddelivery.usps.com" and Subject = "Your Daily Digest" //Note: If MailProcessor is not a class/dart that I have been missing while arguing with Android Developer the link or documentation would be helpful here so as to know if we can just not pull these emails and skip the above step. //CurrentTimestamp = getTime type method. //EmailID = String(username+CurrentTimestamp); //I am supposing that the EmailID's are supposed to be unique through entire database. If it is only for the current user's machine an i counter would be so much easier.
j = 1; //Search for image through email // If finds Image- // (Image)ID = EmailID + "Image"+(As String)j // j = j+1; // MetaTag - ??? // Sender = "Test Sender" // ImageText= "Test Description" // Create New EmailPiece(ID, CurrentTimeStamp, EmailID, MetaTag, Sender, ImageText) // Storage - Either send back to database, or if database is supposed to mean user's mobile device Linked lists. One list for every email. Another list per email for images.

SauterErin commented 2 years ago

Cache Email Notepad.txt

shaneknows commented 2 years ago

Yes something like that and you are correct it should be written in dart. It's fairly similar to java in many aspects so hopefully not too steep of a learning curve. The files do not exist yet and need to be added as part of this functionality I believe

mdconatser commented 2 years ago

Thanks, @umgcjack for the recommendation to just use the latest email timestamp instead of saving another value, I updated the acceptance criteria to include that.

umgcjack commented 2 years ago

@SauterErin Your pseudo-code seems correct. Here are a couple notes about it:

SauterErin commented 2 years ago

Hashing the email message works very well, could throw in the date/time reciept as insurance on top of it. Get a string out of it to use, and then we can interate imageID's off of it/ No problems with the idea for imageID?

Honestly I'll be following your lead as to dart creation/structuring. I am very much a functionality focused type of person. You've been seeing my psuedo code. I treat code as bits of wood to manipulate and transform. As long as we are clear which bits of functionality goes where I am very happy to be told where to place the box.

If you are willing to set up the database that would be great, would definitely set things up on my end as to where to store both the last search value, retrieve it from (and thus set up the "No, there's nothing stored here" response) and where to cache the emails and the MailPieces at the end.

As it is I think I'll be working on the code tomorrow with the storage portion commented out and just focus on sniping out from the preexisting code how to draw out the emails from the server.

Are we also responsible for the MailPieces class creation or am I missing something again?

I am  currently looking at a chunk of code from DigestEmailParser to recycle/repurpose. I would rather not recreate the wheel of dragging the emails in. 

// Future _getDigestEmail() async {    // final client = ImapClient(isLogEnabled: true);    // try {     // Retrieve of LastRetrieval/Search here. If returns null/other signifier set equal to now - 7 days.     // Set DateTime.now() to above value in the line below.       // DateTime targetDate = _targetDate ?? DateTime.now();       //Retrieve the imap server config       //var config = await Discover.discover(_userName, isLogEnabled: false);       //if (config == null) {         //return MimeMessage();       //} else {         // var imapServerConfig = config.preferredIncomingImapServer;         //await client.connectToServer(           //  imapServerConfig!.hostname as String, imapServerConfig.port as int,            // isSecure: imapServerConfig.isSecureSocket);         //await client.login(_userName, _password);         // await client.selectInbox();         //Search for sequence id of the Email         // String searchCriteria =     //Would probably need to change formatTargetDateForSearch to less than, after versus format for on. But rest of search criteria matches.            // 'FROM @.*** ON ${_formatTargetDateForSearch(targetDate)} SUBJECT "Your Daily Digest"';              // I will admit I am struggling with the syntax after here, so great chance of a rewrite after this so that I can follow what's going on besides the internal loops for each email and then each image for the EmailPieces.     // But for each hash the message to get a string to use as EmailID. And you've seen the previous psuedocode.

   

On Friday, September 23, 2022 at 09:21:49 PM EDT, umgcjack ***@***.***> wrote:  

@SauterErin Your pseudo-code seems correct. Here are a couple notes about it:

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

umgcjack commented 2 years ago

The issue for creating the MailPiece is #400, which it looks like is assigned to you. So you can probably just create that as part of this change and knock out two birds(issues) with one stone(PR)! I'll try and get the database stuff sorted out later today.

Also, I think the _getDigestEmail method you referenced is a great place to start. The general logic will probably stay the same, but you'll need to incorporate the limit on the timestamp (latest email time), and probably handle the filtering of emails by subject/sender there too.

It does look like that method only deals with a single email message, so you'll have to figure out how to grab all of the emails since the provided date. I'm pretty sure the enough_mail package has some way to do that.

If you run into any issues, def hit me up. Also, if you push your WIP changes to a branch I'm happy to look it over at any point!

mdconatser commented 2 years ago

tested on android

wpbear1742 commented 1 year ago

tested on iphone PASS

tatikozh commented 1 year ago

tested on iPad

wpbear1742 commented 1 year ago

Verified - Test Pass