ossf / tac

Technical Advisory Council
https://openssf.org
Other
105 stars 46 forks source link

GUAC PoC cloud credits #266

Open mlieberman85 opened 4 months ago

mlieberman85 commented 4 months ago

GUAC is looking to do a PoC with both maintainers of open source projects and end users into par due to a larger effort in the Security Toolbelt.

I spoke to @SecurityCRob that we don't currently have a mechanism for this but I'm ready to work through whatever we decide makes sense here.

mlieberman85 commented 4 months ago

Related to #257

hythloda commented 3 months ago

This was approved in TAC meeting Mar 5, 2024 [Mike] suggest taking any funding requests on an adhoc basis while the process is being set up [Sarahj] want to ensure there is a cap on this for the POC [Mike] can set the limits - would like to see $1k per month [Sarah] is this for a more public service - A): not really- mainly “can this scale” and is the correct data being provided [Sarah] would like to understand what the milestones are - A) can highlight the milestones Only authorized POC users ca use VOTE motion Arnaud to request the funding - Zoom vote taken and approved with 6 votes

hythloda commented 3 months ago

@mlieberman85 I will consult you on slack with details.

mlieberman85 commented 3 months ago

Here is the high level proposal:

GUAC PoC Proposal


Problem(s)

OSS consumers want a place where they can discover and consume metadata and analytics about the software they ingest.

Toolbelt needs a way to store metadata that comes from the toolbelt (e.g. scorecard, SBOMs, SLSA, etc.) and for that metadata to be consumed by end users

OpenSSF incubating project GUAC needs some real world case studies to help identify additional features, bugs or scalability issues


Solution

Build a PoC internal service for a case study.


Other things considered


Proposal


Cost

Here is the level down goals/scope:

Scope

Primary Goals


Secondary Goals


Running the PoC

Resource Reqs

KennyPaul commented 3 months ago

Simply documenting activities discussed in a slack thread with @mlieberman85, @sevansdell, @hythloda and myself back on March 8, 2024.

-kenny

omkhar commented 3 months ago

What is the plan to avoid accidental cost overruns? Can we enforce a monthly spend quota?

mlieberman85 commented 3 months ago

This is where I'm not 100% sure. AWS has some mechanisms to control spend but last I checked there was no way to enforce a monthly spend and then cut it off without additional tooling. We're comfortable utilizing any tool to prevent or minimize any overruns.

KennyPaul commented 3 months ago

I've gotten clarification from LF-IT that while alerts can be set based upon a threshold, there is no way in AWS to enforce a spending cap natively. Extra tooling using some sort of monitoring system is indeed required to shutdown any actively running systems or prevent new resources from being instantiated.

(As a side note, there is evidently some hard limit enforcement via GHA available, BUT that only works GitHub hosted resources rather than 3rd party cloud like AWS, and even then there are certain circumstances where GHA enforcement, "...is a little fuzzy." )

I asked if AWS can send usage alerts to an address different than the one on the account itself. Evidently that can be done. The workaround I'm proposing is to set up an email alias on our end that includes both the AWS account email and any appropriate staff members to receive the alerts. So while not hard enforcement it would allow us to keep multiple eyes on the situation to keep it managed.

-kenny

david-a-wheeler commented 3 months ago

@KennyPaul - that mechanism worries me. I think it's horrifying that there's no "not to exceed" mechanism built-in, it's really easy to create an infinite bill.

If it has to be external monitoring, can we automatically pause things if they get out of hand? We don't want sites down, but we also don't want to go bankrupt.

Also: We may need to quickly switch GUAC to a Series (LLC). We should talk with LF lebal - I think that might create a legal barrier so that a project can go bankrupt without LF or Mike Lieberman going bankrupt. There should be an "emergency stop, no more spend" mechanism for a system that could otherwise create absurd bills.

hythloda commented 3 months ago

Also: We may need to quickly switch GUAC to a Series (LLC). We should talk with LF lebal - I think that might create a legal barrier so that a project can go bankrupt without LF or Mike Lieberman going bankrupt. There should be an "emergency stop, no more spend" mechanism for a system that could otherwise create absurd bills.

Please find the guac charter online at https://github.com/guacsec/governance/blob/main/CHARTER.MD

mlieberman85 commented 3 months ago

@KennyPaul - that mechanism worries me. I think it's horrifying that there's no "not to exceed" mechanism built-in, it's really easy to create an infinite bill.

If it has to be external monitoring, can we automatically pause things if they get out of hand? We don't want sites down, but we also don't want to go bankrupt.

Also: We may need to quickly switch GUAC to a Series (LLC). We should talk with LF lebal - I think that might create a legal barrier so that a project can go bankrupt without LF or Mike Lieberman going bankrupt. There should be an "emergency stop, no more spend" mechanism for a system that could otherwise create absurd bills.

So there are mechanisms within all the cloud services to stop from overrunning certain limits and quotas but they're a bit opaque and usually focused on number of resources than a cost. e.g. no more than 20 ec2 instances unless you ask to have the limit increased... but you can easily turn on a VERY expensive instance. We might be able to manually set the quotas. We can give you an estimate of what we plan to use and just set the quotas to not exceed that. I think.

david-a-wheeler commented 3 months ago

@hythloda - great! Sounds like creating the LLC is done or on its way to getting done. That helps, thank you.

I'd still love to see some sort of "emergency stop". If an attacker's DDoS attack creates a $5million bill within one hour, "reviewing it later" is too late. We can't be the first with that need. I hope a little searching will reveal an existing mechanism for doing that.

mlieberman85 commented 3 months ago

@hythloda - great! Sounds like creating the LLC is done or on its way to getting done. That helps, thank you.

I'd still love to see some sort of "emergency stop". If an attacker's DDoS attack creates a $5million bill within one hour, "reviewing it later" is too late. We can't be the first with that need. I hope a little searching will reveal an existing mechanism for doing that.

So it should be impossible to do that in an account without quotas set really high which requires other approvals. However there's a big difference between like going over by a dollar a month and going over by a few hundred dollars and I'm not sure the state of AWS right now with saying "going over by a dollar is fine... going over by 100 is not" Or something like that. Let me link some of the stuff I've used in the past.

david-a-wheeler commented 3 months ago

@mlieberman85 - sounds perfect.

I'm just worried that an attacker manages to massively use a resource we hadn't limited enough. I believe there are too many services/configuration knobs/etc. in AWS to be absolutely certain we "got them all". It's the "far exceeded total expected spend due to an attacker's intentional actions" case that worries me, especially if it can be done before someone flips an emergency stop switch. I just want to automatically stop a runaway train before it hits people :-).

If we far exceed expected use because it's popular, that's awesome. I hope it happens :-). As long as we monitor activity, I don't expect an automatic emergency stop switch would interfere with it.

omkhar commented 3 months ago

My understanding is that the cloud provider natively supports quota / does not exceed, am I right? Is the issue that LF IT doesn't support setting quotas when using direct billing?

I think the issue of the legal stuff, while very important, might be orthogonal to how to limit a monthly bill, unless someone can explain to me how that helps with setting a quota.

KennyPaul commented 3 months ago

@KennyPaul - that mechanism worries me. I think it's horrifying that there's no "not to exceed" mechanism built-in, it's really easy to create an infinite bill.

@david-a-wheeler Yeah. It worries me too.

@mlieberman85 expense wise the NTE number is $1K /month. What that translated to in actual utilization I have no idea. My initial thoughts related to granularity of reporting would be notification triggers at 50%, 75%, and then 5% increments from there on.

@omkhar I agree that the legal status is not particularly relevant in this particular context. The funding entity is OpenSSF and the billing for that account is configured to be routed appropriately.

All of the information I've been provided by IT indicates this is an AWS issue and not an LF-IT passthrough billing issue:

To facilitate the latter in AWS requires an expense threshold based notification to be fed to some other mechanism that would throttle and/or disable resources.

-kenny

mlieberman85 commented 3 months ago

https://aws.amazon.com/blogs/mt/introducing-service-quotas-view-and-manage-your-quotas-for-aws-services-from-one-central-location/

There's budget stuff we can do here. In a previous life @trmiller wrote a lambda that could turn off resources. I think in our case we can definitely turn stuff off if we see stuff about to hit limits since it's not an outward facing service.

frenchi commented 2 months ago

Chiming in to this thread to hopefully add some clarity (and some expensive lessons learned...):

Service Quotas

While this could be used as a preventative mechanism - it requires forethought in to the types of resources required (which is an inexact science and becomes fragile) & more importantly, does not prevent unexpected classes of billing (e.g. network egress fees, stopped instances, EBS volumes, etc).

In my opinion, Quotas aren't the most effective control to meet the NTE $1k/month budget. They may be worthwhile secondary prevention.

...this is an AWS issue... DOES provide expense threshold based notifications natively DOES NOT provide expense threshold based automated resource throttling or an emergency stop switch natively

Effectively correct, as there is no way to say "I don't want to spend a cent over $1k/month" and have that be enforced.

However... Budget Actions are a little known feature in AWS that may be an appropriate control here.

Depending on the underlying infrastructure, (they have direct support for stopping EC2/RDS instances & SNS support for sending a notification to a Lambda to disable other resources) and firing the alert early (i.e. scale down when the forecasted spend surpasses 95% of the budget), it may be possible to achieve this goal.

mlieberman85 commented 2 months ago

It would not be hard for us to write a lambda as an emergency switch if need be!

sevansdell commented 1 month ago

What work remains for this to be closed?

omkhar commented 1 month ago

My approval is blocked waiting on a proposal for how we would avoid an overspend of the allotted credits. There are several different methods suggested to achieve this outcome, but I have not see a final proposal for how to accomplish this.

mlieberman85 commented 1 month ago

I spoke to @bbpursell1 about this at OSS NA and it looked like there was a way forward. If there's something else I should write up, I can do that.

omkhar commented 1 month ago

via email between @mlieberman85 and @bbpursell1 ...

  1. @bbpursell1 will document a proposed solution and processes to ensure we do not exceed the allocated budget monthly or in total.
  2. @omkhar will approve the proposed solution
  3. @bbpursell1 and @mlieberman85 will implement the proposed solution
  4. Funding will be released
sevansdell commented 3 weeks ago

via email between @mlieberman85 and @bbpursell1 ...

  1. @bbpursell1 will document a proposed solution and processes to ensure we do not exceed the allocated budget monthly or in total.
  2. @omkhar will approve the proposed solution
  3. @bbpursell1 and @mlieberman85 will implement the proposed solution
  4. Funding will be released

How is this progressing please?

CoS-Harry commented 2 weeks ago

The budget requested for this was $1K a month, however we need a solid annualized budget, can someone provide that?

mlieberman85 commented 2 weeks ago

Is reframing it here and just saying $12k good enough? The original idea was $12k over 1 year. We can speed up timelines.

CoS-Harry commented 2 weeks ago

Can you give me a start and end date?

On Wed, Jun 12, 2024 at 2:11 PM Michael Lieberman @.***> wrote:

Is reframing it here and just saying $12k good enough? The original idea was $12k over 1 year. We can speed up timelines.

— Reply to this email directly, view it on GitHub https://github.com/ossf/tac/issues/266#issuecomment-2163635259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOVUVNEY3OGTRQHRERDMIP3ZHCFNBAVCNFSM6AAAAABDNBDUEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGYZTKMRVHE . You are receiving this because you commented.Message ID: @.***>

mlieberman85 commented 2 weeks ago

Well start is whenever we can get the money. End date would be Dec 31.