nasa / opera-sds-pcm

Observational Products for End-Users from Remote Sensing Analysis (OPERA)
Apache License 2.0
17 stars 12 forks source link

[New Feature]: No rolling storage - localize DAAC S3 products directly to Verdi workers #628

Open riverma opened 9 months ago

riverma commented 9 months ago

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

Given that the source DAAC's that OPERA obtains products from are located in the same AWS region as OPERA, we could save in latency and storage costs if we did not copy S3 products from DAAC's storage over a rolling storage and then localized onto workers, and instead directly localized to workers as needed from DAAC S3.

Describe the feature request

Current method:

graph LR
    A[DAAC S3 Storage] -->|Download products| B[OPERA S3 Rolling Storage]
    B -->|Localize products| C[OPERA Verdi Workers]

Proposed method:

graph TB
    A[DAAC S3 Storage] -->|Localize products| C[OPERA Verdi Workers]
riverma commented 9 months ago

Investigation needed (FYI @hhlee445):

hhlee445 commented 9 months ago

We can't copy a granule directly from ASF S3 to a private VPC verdi worker since we need to get a S3 credential which requires to use public VPC

riverma commented 9 months ago

We can't copy a granule directly from ASF S3 to a private VPC verdi worker since we need to get a S3 credential which requires to use public VPC

Hi @hhlee445 - isn't it possible to get an S3 credential once (or renew) on a public VPC machine and share that cred with a private VPC verdi worker? I'm trying to understand what would block that. Since Verdi workers get passed all kinds of metadata, there should be a path to share a download credential / token with a private VPC worker.