mwvgroup / pittgoogle-user-demos

GNU General Public License v3.0
2 stars 0 forks source link

Keep an eye on GCP billing #4

Open troyraen opened 2 years ago

troyraen commented 2 years ago

Billing Report for GCP project: elasticc-challenge (updated about once per day)

We need to keep an eye on how much we're spending on Google Cloud as we test this code and run the actual challenge. We should check the billing reports regularly.

In addition, before running a test, get clear about what you expect to happen. Here are some questions to ask yourself:

After running tests, check the billing report for anything that's unexpected.

During the actual challenge, we should review the billing to make sure it's not wildly different than what we expect, and to see if there's simple changes we could make to save money.

troyraen commented 2 years ago

Note: I am posting this here because it contains information about specific things that can be expensive in GCP, and thus it may be useful for others in the future.

@hernandezc1 You should go look at the billing report (linked above) now, and specifically look at the charges on Sept 10 -- that is the day you ran a test that ingested 1.4 million alerts from ELAsTiCC's (test) Kafka stream, and cost about $150. Notice:

Don’t worry about having done this already on the Sept 10 test. The fact that you ran one test that cost $150 is not a big deal (ask me about my $300 test). But 1) that test didn’t need to cost nearly that much, and 2) similar mistakes can quickly rack up costs that are even much larger than this. So that’s why I want you to look at/understand the details of what happened during that test, and why I will elaborate more below:

We should have done that test with only the consumer module running (VM + Pub/Sub stream). We shouldn’t send that many alerts into modules unless we actually need to test how the modules will respond to that kind of load. One of the things I’ve noticed that can get expensive fast on Google Cloud is trying to do something that doesn’t work, many many times.

On Sept 10, we had 4 Cloud Functions running as part of that test. Three of them contained bugs that made them crash for every single alert. The forth (supernnova) had several problems that I'm guessing resulted from Cloud Functions quotas, which do not allow you to run that much data thru in such a short time. The most expensive problem was this: The runtime for the supernnova Function to process a single alert went way up to almost a full minute (because it had to wait for a quota to renew, I assume). You can see that if you look at the “Execution time” metrics for the Function (in the Console). You can see how expensive that runtime was if you change the “Group by” drop down on the billing report to “SKU”, and then look for the SKU called “CPU time” (almost 3/4 the total cost).