taers232c / GAMADV-XTD3

Command line tool to manage Google Workspace
703 stars 87 forks source link

[Feature Request] Download individual Vault objects instead of complete <ExportItem> #273

Closed jay-eleven closed 2 years ago

jay-eleven commented 2 years ago

Hi Ross!

Some Vault exports are huuuuuuge. For example, one of my users had a 1.1T Drive Vault export:

jay@cloudshell:~$ gam print vaultexports matter myMatter fields stats.sizeInBytes,status | grep "user@domain.com" | cut -d, -f4-6 | numfmt --to=iec --field=2 -d, | sort
Getting all Vault Exports for Vault Matter: myMatter(xxxxxxx)
Got 5 Vault Exports for Vault Matter: myMatter(xxxxxxx)...
user@domain.com-vault-chat-mbox,8.1K,COMPLETED
user@domain.com-vault-chat-pst,16K,COMPLETED
user@domain.com-vault-drive,1.1T,COMPLETED
user@domain.com-vault-gmail-mbox,256M,COMPLETED
user@domain.com-vault-gmail-pst,279M,COMPLETED

This 1.1T export was formed by almost 90 zip files:

jay@cloudshell:~$ gam show vaultexports matter myMatter | grep "objectName" | grep "drive"
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive-custodian-docid.csv
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive-metadata.xml
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_0.zip
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_1.zip
[...]
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_84.zip
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_85.zip

So, unless I'm missing something, a command like gam download vaultexport <MatterItem> <ExportItem> for this user would require 1.1T free space in my local drive.

Turns out Vault generates .zip files that are ~10-15Gb size, so instead of downloading all of them at once it would be awesome if I could download one by one and not need a huge local drive with a ton of free space.

In order to accomplish this, several things need to happen.

  1. gam show vaultexports needs to display a cloudStorageSink.files.objectURI field formed concatenating bucketName with objectURI in order to form a valid Cloud Storage URI: gs://<bucketName>/<objectName>
  2. gam show vaultexports needs to be able to filter by cloudStorageSink.files.objectURI by allowing fields cloudStorageSink.files.objectURI
  3. gam download vaultexport command needs to be extended to support individual objectURIs. Something like gam download vaultexport <ExportItem> object <objectURI> matter <MatterItem>

Then something like this would be possible:

  1. Use gam redirect stdout vaultfiles.csv show vaultexports ee matter mm fields cloudStorageSink.files.objectURI to extract all URIs to a file
  2. Do some looping like:
    while read FILE
    do
    # Download one file
    gam download vaultexport ee object $FILE matter mm
    # Upload file to Drive
    gam user uu add drivefile localfile $FILE parentname pp
    # Delete file
    rm $FILE
    done < vaultfiles.csv

Thoughts?

taers232c commented 2 years ago

Jay,

Swamped at the moment, I can get to this next week.

Ross

taers232c commented 2 years ago

6.22.18 https://github.com/taers232c/GAMADV-XTD3/wiki/Vault#display-vault-exports https://github.com/taers232c/GAMADV-XTD3/wiki/Vault#download-vault-exports

The pseudo code needs cleanup

jay-eleven commented 2 years ago

Wow!! I was not expecting such a fast turnaround. I'll test ASAP and report back.

Thanks Ross.

jay-eleven commented 2 years ago

I've tested this and it works flawlessly.

Thanks Ross!!

taers232c commented 2 years ago

Jay,

Did you look at the pseudo code in the Wiki?

Ross

@.***

On May 31, 2022, at 8:46 AM, Jay @.***> wrote:

Closed #273 https://github.com/taers232c/GAMADV-XTD3/issues/273 as completed.

β€” Reply to this email directly, view it on GitHub https://github.com/taers232c/GAMADV-XTD3/issues/273#event-6710853103, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCTYLZQM6TZTTZBOUNWH2LVMYX45ANCNFSM5W47CUXQ. You are receiving this because you commented.

jay-eleven commented 2 years ago

I'm testing a shell script I wrote and so far it's humming happily. When it finishes running, I'll update documentation to show a working example.

jay-eleven commented 2 years ago

Wiki updated. Take a look, the working example might be a bit overkill and you might want to leave your pseudo code... πŸ˜…

taers232c commented 2 years ago

Looks good to me. Would another example be to just download the .csv and .xml files; could someone derive information from them to decide which .zip file to download?

@.***

On May 31, 2022, at 10:47 AM, Jay @.***> wrote:

Wiki updated https://github.com/taers232c/GAMADV-XTD3/wiki/Vault#process-vault-export-files. Take a look, the working example might be a bit overkill and you might want to leave your pseudo code... πŸ˜…

β€” Reply to this email directly, view it on GitHub https://github.com/taers232c/GAMADV-XTD3/issues/273#issuecomment-1142431859, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCTYL27HSHK7YE4UIC74QTVMZGCZANCNFSM5W47CUXQ. You are receiving this because you commented.

jay-eleven commented 2 years ago

Neither .csv nor .xml refer in any way to the .zip files so it's impossible to derive information from them as to which particular .zip file to download. Some CSVs just contain one row (not even headers) with the number of exported elements. Some just have MessageIds and Gmail labels or Drive metadata. Honestly, I can't see how anybody would like to download just one file from Vault, you need all of them to be able to rebuild a Mailbox or a Drive folder.