Log email attachments - Githubissues

FelixMalfait commented 5 months ago

Context

We've built an integration with gmail which allows workspace members to connect their accounts and import all emails. Those emails are then automatically attached to company/contacts.

The gmail API exposes attachments for every message but we ignore them.

The goal of this issue would be to import attachments in our database to then display them in the Files tab and in email threads.

Backend

The steps on the backend side should roughly look like this:

STORAGE_DRIVER env variable is now used to store the default driver only
Add new StorageDriverType for gmail
Adapt fileService/fileStorageService to stream from Gmail's API instead of s3
Add a new field storageDriver on Attachments
Add new relation to messages on Attachments
Adapt the gmail fetch service to create attachments along with company and people during import

This is just a general guideline, this a complex issue that will require investigation!

Frontend

Because we use the existing Attachment object it should automatically appear in Files tabs (nothing to do?)
We should also add a clipper icon on the threads list
We should show it on thread level (we display it as a field/fieldValue like for the properties of a task for example)
We should show it on message level

Link to Figma

tatethurston commented 4 months ago

@quest-bot embark

quest-bot[bot] commented 4 months ago

⚠️ There's no active Quest for this Issue.

Check the docs for more info.

tatethurston commented 4 months ago

@FelixMalfait do you all use any centralized access for shared secrets? Specifically here I needed AUTH_GOOGLE_CLIENT_ID and AUTH_GOOGLE_CLIENT_SECRET. I went through the process of spinning up a personal google integration, but I'm curious about this for future PRs.

FelixMalfait commented 4 months ago

@tatethurston sorry for the delayed reply! We don't really have better process no, every just setup their own. I don't know if it would be possible to do otherwise 🤔

tatethurston commented 4 months ago

No worries. I have limited capacity and don’t think I’ll be able to push this one through for a few weeks. Happy to let someone else take this on.

varunKT001 commented 4 months ago

I would like to work on this issue.

rostaklein commented 3 months ago

Hey guys, i just started looking into this one. Please @FelixMalfait assign this issue to me. I was able to set up my own Google credentials and sync a few emails. Is there any preferred way to work with the email sync other than calling a manual command e.g. yarn nx command twenty-server workspace:gmail-partial-sync -w 20202020-1c25-4d02-bf25-6aeccf7ea419? 🤔 Any more tips/tricks on how to get emails visible in the UI and so (any reference PR)? 😊

FelixMalfait commented 3 months ago

@varunKT001 thanks! Since this issue is a very complex one, you might want to start with smaller issues first. @rostaklein has some experience on the code base so I'm assigning it to him, but feel free to take another one!

cc @bosiraphael any tip to give to @rostaklein? thanks

varunKT001 commented 3 months ago

@FelixMalfait sure no issues. I'll pick another issue 👍

bosiraphael commented 3 months ago

Hello @rostaklein, thank you for taking this issue :) Here are some insights:

Currently, we're querying the raw email from gmail API using mailparser to parse the email content and headers. The raw email contains the whole attachment but does not contain the gmail api attachment id. Since we do not want to store the attachment but a way to get the attachment from the gmail api, we have two solutions:

Make a second query for files with attachments (Simpler)

Query the email in raw format
If the message has an attachment, make a second query to get the attachmentId
Save attachements after saving the messages and store gmailAttachmentId in the attachment table
We will be able to get the attachments using users.messages.attachments.get

This solution is simpler but we're wasting bandwidth because we're getting the whole attachment and doing nothing with it, and we also have to do a second query.

Change the query format to full instead of raw and modify the parsing logic (Better)

Change query format to full instead of raw in users.messages.get in fetch-messages-by-batches.service
Use adressParser instead of simpleParser to parse address objects (from, to, cc, bcc) and update the way we save messages in formatBatchResponseAsGmailMessage in fetch-messages-by-batches.service
Since we're not using mailParser simpleParser anymore, implement a function to parse the different message parts and get the whole message plain text
Save attachements information after saving the messages and store gmailAttachmentId in the attachment table
We will be able to get the attachments using users.messages.attachments.get

twentyhq / twenty

Log email attachments #4108

Context

Backend

Frontend

Make a second query for files with attachments (Simpler)

Change the query format to full instead of raw and modify the parsing logic (Better)