twentyhq / twenty

Building a modern alternative to Salesforce, powered by the community.
https://twenty.com
GNU Affero General Public License v3.0
15.19k stars 1.54k forks source link

Log email attachments #4108

Open FelixMalfait opened 5 months ago

FelixMalfait commented 5 months ago

Context

We've built an integration with gmail which allows workspace members to connect their accounts and import all emails. Those emails are then automatically attached to company/contacts.

The gmail API exposes attachments for every message but we ignore them.

The goal of this issue would be to import attachments in our database to then display them in the Files tab and in email threads.

Backend

The steps on the backend side should roughly look like this:

  1. STORAGE_DRIVER env variable is now used to store the default driver only
  2. Add new StorageDriverType for gmail
  3. Adapt fileService/fileStorageService to stream from Gmail's API instead of s3
  4. Add a new field storageDriver on Attachments
  5. Add new relation to messages on Attachments
  6. Adapt the gmail fetch service to create attachments along with company and people during import

This is just a general guideline, this a complex issue that will require investigation!

Frontend

  1. Because we use the existing Attachment object it should automatically appear in Files tabs (nothing to do?)

  2. We should also add a clipper icon on the threads list

    Screenshot 2024-02-23 at 13 56 45
  3. We should show it on thread level (we display it as a field/fieldValue like for the properties of a task for example)

    Screenshot 2024-02-23 at 13 58 24
  4. We should show it on message level

    Screenshot 2024-02-23 at 13 59 33

Link to Figma

tatethurston commented 4 months ago

@quest-bot embark

quest-bot[bot] commented 4 months ago

⚠️ There's no active Quest for this Issue.

Check the docs for more info.

tatethurston commented 4 months ago

@FelixMalfait do you all use any centralized access for shared secrets? Specifically here I needed AUTH_GOOGLE_CLIENT_ID and AUTH_GOOGLE_CLIENT_SECRET. I went through the process of spinning up a personal google integration, but I'm curious about this for future PRs.

FelixMalfait commented 4 months ago

@tatethurston sorry for the delayed reply! We don't really have better process no, every just setup their own. I don't know if it would be possible to do otherwise 🤔

tatethurston commented 4 months ago

No worries. I have limited capacity and don’t think I’ll be able to push this one through for a few weeks. Happy to let someone else take this on.

varunKT001 commented 4 months ago

I would like to work on this issue.

rostaklein commented 3 months ago

Hey guys, i just started looking into this one. Please @FelixMalfait assign this issue to me. I was able to set up my own Google credentials and sync a few emails. Is there any preferred way to work with the email sync other than calling a manual command e.g. yarn nx command twenty-server workspace:gmail-partial-sync -w 20202020-1c25-4d02-bf25-6aeccf7ea419? 🤔 Any more tips/tricks on how to get emails visible in the UI and so (any reference PR)? 😊

FelixMalfait commented 3 months ago

@varunKT001 thanks! Since this issue is a very complex one, you might want to start with smaller issues first. @rostaklein has some experience on the code base so I'm assigning it to him, but feel free to take another one!

cc @bosiraphael any tip to give to @rostaklein? thanks

varunKT001 commented 3 months ago

@FelixMalfait sure no issues. I'll pick another issue 👍

bosiraphael commented 3 months ago

Hello @rostaklein, thank you for taking this issue :) Here are some insights:

Currently, we're querying the raw email from gmail API using mailparser to parse the email content and headers. The raw email contains the whole attachment but does not contain the gmail api attachment id. Since we do not want to store the attachment but a way to get the attachment from the gmail api, we have two solutions:

Make a second query for files with attachments (Simpler)

This solution is simpler but we're wasting bandwidth because we're getting the whole attachment and doing nothing with it, and we also have to do a second query.

Change the query format to full instead of raw and modify the parsing logic (Better)