steven-sheehy commented 4 years ago

Problem Currently the mirror node bucket is not public so the community can't run the software or contribute to the project. We should have some way to make it easier for the community to participate without incurring a large S3 cost.

Solution Investigate requester pays buckets. Main things are:

~Verify if non-requester pays bucket can be converted to requester pays easily.~ Yes
~Verify how to pass requester pays ID to S3 client~ Done. Code changes released in 0.8.0.
~Verify if we can pass a valid requester pays ID to a bucket that is not requester pays without failing~. Yes
~Estimate cost savings~ Done
~Test against custom requester pays bucket~ Done
~Design migration path to requester pays~ Done

Followups from discussion on 3/23:

Test GCP buckets
- ~Parallel stack using gcp buckets to build trust on them~ Done. dev env is using testnet gcp bucket. (4/6)
- ~Measure latency of GCP buckets~ Done. Details in comments below (4/6)
Migrate GCP buckets to RP first
- ~Test in pre-prod env to verify that mc uploaders works with RP~ Done. (4/8)

Next steps:

Confirm that testnet and mainnet GCP buckets are not used by anyone right now (devops)
- Find IPs currently accessing aws buckets
Migrate testnet bucket (TBD, devops)
Migrate mainnet bucket (TBD, devops)

Outstanding questions:

Best way for hedera to pay for selective partners’ access to GCP and AWS (devops)
- It's always possible to give them access/secret key (access controlled, usage paid by us).
- Is it possible to do it neatly using accounts? (delegated sub-accounts in aws. adding partner's google account to special gcp project for just billing)
Details of GCP storage bucket pricing from marketplace instances. (devops)

Good to have:

With RP, will be good to back-off poll frequency in downloader when failures happen (since hedera pays for failed requests) (product eng)

Alternatives ~~Setup a public example bucket with a small amount of data~~ done

Additional Context

apeksharma commented 4 years ago

+cc @jsindy so he can follow discussions here.

Nana-EC commented 4 years ago

Did some minor reading. It should be fine to just add .requestPayer("requester") to ListObjectsRequest.builder() in Downloaded.java. See example of javascript request - https://aws.amazon.com/blogs/developer/the-aws-sdk-for-javascript-now-supports-amazon-s3-requester-pays-buckets/

This way anyone running mirror node would have acknowledged they are a requestor. Then when we run we'd essentially be the equivalent of a subscriber to our own bucket and would be billed appropriately as we do now. S3 has logic to handle this based on account details.

Might require/be good creating a new account in s3 with the appropriate permissions that would match a non Hedera client requestor.

apeksharma commented 4 years ago

Summary of "Requester Pays" (abbv. RP) bucket:

Owner (hedera) pays for cost of storage
Requester pays for API request and data transfer
Requester authenticates via IAM access/secret key. Account to which IAM belongs is billed.
Anonymous access to RP buckets are allowed (obvious)
- We may need IAMs for devs since ls/cp are often needed to collect data from S3. Alternatively, GCP buckets can be used. I personally haven't used them yet, so not sure. We have both options, so that's good.
RP buckets support following:
- aws sdk : used in our mirror node code
- aws cli : convenient for devs
- GET, HEAD, POST requests That's good.
Bucket owner is charged for various kinds of failed requests : failed auth, anonymous, no x-amz-request-payer param, SOAP request. There's nothing we can do about it.

Changing bucket to be (or not to be) RP

It's easy to make an existing bucket RP or go back and forth (non-RP <--> RP). https://docs.aws.amazon.com/AmazonS3/latest/dev/configure-requester-pays-console.html In my tests, I found that effect of toggling RP on/off was immediately visible to clients.

Basic testing

Setup: Used my personal aws account to setup two buckets: appy1 and appy1-requester-pays. Ran following tests: https://github.com/hashgraph/hedera-mirror-node/blob/s3_requester_pays/hedera-mirror-importer/src/test/java/com/hedera/mirror/importer/downloader/RequesterPayBucketTest.java Summary:

+-------------------+---------------+-----------+
|                   | Free bucket   | RP bucket |
+-------------------+---------------+-----------+
| anonymous request | ✓  (current)  | ✗         |
| RP request        | ✓  (migrating)| ✓ (final) |
+-------------------+---------------+-----------+

Would be great if someone else can run the tests too with access/secret key not belonging to owner (me). Key takeaway: Since setting requestPayer works for both non-RP/RP buckets, we'll set it for all requests. In fact, this is critical for migration.

Code changes needed in mirror node importer (this doesn't affect REST/GRPC components)

Very minimal. We already have configs to set access/secret keys for the S3 client. In Downloader.java, when making ListObjectsRequest, we just need to add requestPayer(RequestPayer.REQUESTER). It's exactly as Nana pointed above. Testing: Not sure how we can have good checked-in tests for it. We currently use S3Mock for testing Downloader, but it doesn't support authentication (which means it won't have support to test RP). Am sure we can cook up some basic testing though.

Running mirror node importer

Just configuring access/secret key would be enough. Hedera's importer, like any other external importer, will need to be configured with IAM credentials too.

Migration:

There are two way we can do the migration:

1. Single bucket toggle

a. We make the necessary code changes. Release them. b. Announce to partners a date (of X months later) as deadline to update their importers. (current --> migrating state in above table). c. On the date, during migration window, we toggle our free bucket to be RP. (migrating --> final)

2. Two buckets

a. Setup duplicate bucket which will be RP from the start. b. Announce to partners a date (of X months later) as deadline to update their importers and use RP bucket (current --> final state in above table). c. On the date, during migration window, we delete the free bucket.

3. Two buckets with GCP

a. Verify GCP bucket is not used by anyone (otherwise this strategy doesn't work) b. Test running mirror node with GCP bucket (to build trust). Convert it to RP. Build more trust. c. Announce to partners a date (of X months later) as deadline to update their importers and use GCP RP bucket (current --> final state in above table). d. On the date, without migration window, make AWS bucket RP.

Pros/Cons:

In (2), there will be significant devops cost
- setup duplicate bucket
- copy all existing data
- build a temporary service to copy new stream files from free to RP bucket: If it's down, it's equivalent to mainnet down for wallet providers. So will need monitoring, oncall, etc.
- When free bucket is deleted, the jobs uploading stream files to free bucket will need to be updated and restarted to upload to RP bucket instead. So I should have probably written 'significant' in all caps earlier.
In (2), there will also be extra $$ cost of few thousands.
In (1), the cost of engineering and $$ is replaced by risk. Steps (a) and (b) are simple. In (b), partners will do their testing and update the importers. In (c), to be sure that we don't break anyone, we'll need a POC with each partner to ack us after the toggle. All that after we have tested migration using parallel test-only stack of S3 bucket, importer, etc.
(3) is basically same as (2) except that we don't have to setup duplicate bucket since we use existing GCP bucket as 'second bucket'. So no extra devops and $$ cost. But it is conditional on (1) no one is using existing GCP bucket, (2) our testing with gcp buckets prove successful.

Cost

S3 cost for single mirror node: \~350$ per month (record stream: \~340$; balance stream: \~5$) Costs depend significantly on polling frequency. Above are for polling frequency of 500ms for record stream and 30s for balance stream. For example, increasing record stream poll frequency from 500ms to 1s will bring the cost down to \~175$.

Cost savings for hedera: 350$ * number of external nodes.

Outstanding questions:

Performance impact? No mention in S3 docs. Even if it adds few ms to requests, it is inconsequential to us. So not worth investigating.

apeksharma commented 4 years ago

622 released in 0.8.0. For S3, just setting access/secret would be enough. For GCP, using a user's (not service account) access/secret key would be enough. User will need to have default project set in GCP.

apeksharma commented 4 years ago

Confirm that GCP buckets are not used by anyone right now (Brad/Josh)

We are seeing ListObject and WriteObject requests @ 8-9/sec. No GetObject requests. It means no one is using that bucket very likely. If a standard configuration mirror node was using that bucket, we’d see ListObject requests @ 40+/sec (20 per stream) and some GetObject requests. To gain more certainty, Josh will try to find if there’s a way to get IPs of those clients.

apeksharma commented 4 years ago

Migrate GCP buckets to RP first

We’ll be testing uploader compatibility using one of the pre-prod envs on Monday. We’ll switch the bucket to RP, and see if uploader still work. If so, we’ll call it a day and go home. JK, we’ll already be at home. If not, test option 2: Use user access/secret keys. User will need default project id set in GCP. This is some tricky thing in GCP interoperability mode where service accounts don’t cut it. That’s how I made the mirror node work. If not, option 3 would be major changes to uploader (~weeks).

apeksharma commented 4 years ago

Measure latency of GCP buckets

Metric hedera_mirror_transaction_latency measures time between a transaction achieving consensus (consensusTimestamp) and it being processed by mirrornode parser. The flow between two events looks like: tx achieves consensus --> nodes write stream files --> uploaded to S3/GCP --> downloaded by mirror node --> consensus is verified --> transactions are parsed one at a time While individual value of this metric is not useful (since tx may be in start or end of stream file), the aggregate across many files is perfect for our case. Just seeing changes to this metric is enough to measure latency impact of S3 vs GCS.

In Kibana, the metric looks like Important thing to note is, Kibana doesn't have individual values, only the aggregate for 30sec. This is fine for our current case.

Transaction latency before and after the switch. The odd peak around 4/6 00:00 is when i stopped the importer to switch from S3 to GCP. Latencies are in same range. So i believe we are good here.

apeksharma commented 4 years ago

Migration testing : Mirror Node Importer with GCP Requester Pays bucket

Setup: GCP bucket: appy-demo-streams (exact copy for hedera-demo-streams) Initially, bucket has RP off. Started mirror node importer with appropriate access/secret keys set.

Test: Toggled RP on --> off --> on --> off --> on. Importer kept working smooth as butter.

apeksharma commented 4 years ago

All next steps/outstanding questions are devops tasks. Updated description to mention the same. Since there's no remaining task to be done by product eng, closing this ticket.

hashgraph / hedera-mirror-node

Investigate requester pays buckets #357

Summary of "Requester Pays" (abbv. RP) bucket:

Changing bucket to be (or not to be) RP

Basic testing

Code changes needed in mirror node importer (this doesn't affect REST/GRPC components)

Running mirror node importer

Migration:

1. Single bucket toggle

2. Two buckets

3. Two buckets with GCP

Cost

Outstanding questions:

622 released in 0.8.0. For S3, just setting access/secret would be enough. For GCP, using a user's (not service account) access/secret key would be enough. User will need to have default project set in GCP.

Migration testing : Mirror Node Importer with GCP Requester Pays bucket