Open mlieberman85 opened 9 months ago
Related to #257
This was approved in TAC meeting Mar 5, 2024 [Mike] suggest taking any funding requests on an adhoc basis while the process is being set up [Sarahj] want to ensure there is a cap on this for the POC [Mike] can set the limits - would like to see $1k per month [Sarah] is this for a more public service - A): not really- mainly “can this scale” and is the correct data being provided [Sarah] would like to understand what the milestones are - A) can highlight the milestones Only authorized POC users ca use VOTE motion Arnaud to request the funding - Zoom vote taken and approved with 6 votes
@mlieberman85 I will consult you on slack with details.
Here is the high level proposal:
OSS consumers want a place where they can discover and consume metadata and analytics about the software they ingest.
Toolbelt needs a way to store metadata that comes from the toolbelt (e.g. scorecard, SBOMs, SLSA, etc.) and for that metadata to be consumed by end users
OpenSSF incubating project GUAC needs some real world case studies to help identify additional features, bugs or scalability issues
Build a PoC internal service for a case study.
Here is the level down goals/scope:
Simply documenting activities discussed in a slack thread with @mlieberman85, @sevansdell, @hythloda and myself back on March 8, 2024.
-kenny
What is the plan to avoid accidental cost overruns? Can we enforce a monthly spend quota?
This is where I'm not 100% sure. AWS has some mechanisms to control spend but last I checked there was no way to enforce a monthly spend and then cut it off without additional tooling. We're comfortable utilizing any tool to prevent or minimize any overruns.
I've gotten clarification from LF-IT that while alerts can be set based upon a threshold, there is no way in AWS to enforce a spending cap natively. Extra tooling using some sort of monitoring system is indeed required to shutdown any actively running systems or prevent new resources from being instantiated.
(As a side note, there is evidently some hard limit enforcement via GHA available, BUT that only works GitHub hosted resources rather than 3rd party cloud like AWS, and even then there are certain circumstances where GHA enforcement, "...is a little fuzzy." )
I asked if AWS can send usage alerts to an address different than the one on the account itself. Evidently that can be done. The workaround I'm proposing is to set up an email alias on our end that includes both the AWS account email and any appropriate staff members to receive the alerts. So while not hard enforcement it would allow us to keep multiple eyes on the situation to keep it managed.
-kenny
@KennyPaul - that mechanism worries me. I think it's horrifying that there's no "not to exceed" mechanism built-in, it's really easy to create an infinite bill.
If it has to be external monitoring, can we automatically pause things if they get out of hand? We don't want sites down, but we also don't want to go bankrupt.
Also: We may need to quickly switch GUAC to a Series (LLC). We should talk with LF lebal - I think that might create a legal barrier so that a project can go bankrupt without LF or Mike Lieberman going bankrupt. There should be an "emergency stop, no more spend" mechanism for a system that could otherwise create absurd bills.
Also: We may need to quickly switch GUAC to a Series (LLC). We should talk with LF lebal - I think that might create a legal barrier so that a project can go bankrupt without LF or Mike Lieberman going bankrupt. There should be an "emergency stop, no more spend" mechanism for a system that could otherwise create absurd bills.
Please find the guac charter online at https://github.com/guacsec/governance/blob/main/CHARTER.MD
@KennyPaul - that mechanism worries me. I think it's horrifying that there's no "not to exceed" mechanism built-in, it's really easy to create an infinite bill.
If it has to be external monitoring, can we automatically pause things if they get out of hand? We don't want sites down, but we also don't want to go bankrupt.
Also: We may need to quickly switch GUAC to a Series (LLC). We should talk with LF lebal - I think that might create a legal barrier so that a project can go bankrupt without LF or Mike Lieberman going bankrupt. There should be an "emergency stop, no more spend" mechanism for a system that could otherwise create absurd bills.
So there are mechanisms within all the cloud services to stop from overrunning certain limits and quotas but they're a bit opaque and usually focused on number of resources than a cost. e.g. no more than 20 ec2 instances unless you ask to have the limit increased... but you can easily turn on a VERY expensive instance. We might be able to manually set the quotas. We can give you an estimate of what we plan to use and just set the quotas to not exceed that. I think.
@hythloda - great! Sounds like creating the LLC is done or on its way to getting done. That helps, thank you.
I'd still love to see some sort of "emergency stop". If an attacker's DDoS attack creates a $5million bill within one hour, "reviewing it later" is too late. We can't be the first with that need. I hope a little searching will reveal an existing mechanism for doing that.
@hythloda - great! Sounds like creating the LLC is done or on its way to getting done. That helps, thank you.
I'd still love to see some sort of "emergency stop". If an attacker's DDoS attack creates a $5million bill within one hour, "reviewing it later" is too late. We can't be the first with that need. I hope a little searching will reveal an existing mechanism for doing that.
So it should be impossible to do that in an account without quotas set really high which requires other approvals. However there's a big difference between like going over by a dollar a month and going over by a few hundred dollars and I'm not sure the state of AWS right now with saying "going over by a dollar is fine... going over by 100 is not" Or something like that. Let me link some of the stuff I've used in the past.
@mlieberman85 - sounds perfect.
I'm just worried that an attacker manages to massively use a resource we hadn't limited enough. I believe there are too many services/configuration knobs/etc. in AWS to be absolutely certain we "got them all". It's the "far exceeded total expected spend due to an attacker's intentional actions" case that worries me, especially if it can be done before someone flips an emergency stop switch. I just want to automatically stop a runaway train before it hits people :-).
If we far exceed expected use because it's popular, that's awesome. I hope it happens :-). As long as we monitor activity, I don't expect an automatic emergency stop switch would interfere with it.
My understanding is that the cloud provider natively supports quota / does not exceed, am I right? Is the issue that LF IT doesn't support setting quotas when using direct billing?
I think the issue of the legal stuff, while very important, might be orthogonal to how to limit a monthly bill, unless someone can explain to me how that helps with setting a quota.
@KennyPaul - that mechanism worries me. I think it's horrifying that there's no "not to exceed" mechanism built-in, it's really easy to create an infinite bill.
@david-a-wheeler Yeah. It worries me too.
@mlieberman85 expense wise the NTE number is $1K /month. What that translated to in actual utilization I have no idea. My initial thoughts related to granularity of reporting would be notification triggers at 50%, 75%, and then 5% increments from there on.
@omkhar I agree that the legal status is not particularly relevant in this particular context. The funding entity is OpenSSF and the billing for that account is configured to be routed appropriately.
All of the information I've been provided by IT indicates this is an AWS issue and not an LF-IT passthrough billing issue:
To facilitate the latter in AWS requires an expense threshold based notification to be fed to some other mechanism that would throttle and/or disable resources.
-kenny
There's budget stuff we can do here. In a previous life @trmiller wrote a lambda that could turn off resources. I think in our case we can definitely turn stuff off if we see stuff about to hit limits since it's not an outward facing service.
Chiming in to this thread to hopefully add some clarity (and some expensive lessons learned...):
Service Quotas
While this could be used as a preventative mechanism - it requires forethought in to the types of resources required (which is an inexact science and becomes fragile) & more importantly, does not prevent unexpected classes of billing (e.g. network egress fees, stopped instances, EBS volumes, etc).
In my opinion, Quotas aren't the most effective control to meet the NTE $1k/month budget. They may be worthwhile secondary prevention.
...this is an AWS issue... DOES provide expense threshold based notifications natively DOES NOT provide expense threshold based automated resource throttling or an emergency stop switch natively
Effectively correct, as there is no way to say "I don't want to spend a cent over $1k/month" and have that be enforced.
However... Budget Actions are a little known feature in AWS that may be an appropriate control here.
Depending on the underlying infrastructure, (they have direct support for stopping EC2/RDS instances & SNS support for sending a notification to a Lambda to disable other resources) and firing the alert early (i.e. scale down when the forecasted spend surpasses 95% of the budget), it may be possible to achieve this goal.
It would not be hard for us to write a lambda as an emergency switch if need be!
What work remains for this to be closed?
My approval is blocked waiting on a proposal for how we would avoid an overspend of the allotted credits. There are several different methods suggested to achieve this outcome, but I have not see a final proposal for how to accomplish this.
I spoke to @bbpursell1 about this at OSS NA and it looked like there was a way forward. If there's something else I should write up, I can do that.
via email between @mlieberman85 and @bbpursell1 ...
via email between @mlieberman85 and @bbpursell1 ...
- @bbpursell1 will document a proposed solution and processes to ensure we do not exceed the allocated budget monthly or in total.
- @omkhar will approve the proposed solution
- @bbpursell1 and @mlieberman85 will implement the proposed solution
- Funding will be released
How is this progressing please?
The budget requested for this was $1K a month, however we need a solid annualized budget, can someone provide that?
Is reframing it here and just saying $12k good enough? The original idea was $12k over 1 year. We can speed up timelines.
Can you give me a start and end date?
On Wed, Jun 12, 2024 at 2:11 PM Michael Lieberman @.***> wrote:
Is reframing it here and just saying $12k good enough? The original idea was $12k over 1 year. We can speed up timelines.
— Reply to this email directly, view it on GitHub https://github.com/ossf/tac/issues/266#issuecomment-2163635259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOVUVNEY3OGTRQHRERDMIP3ZHCFNBAVCNFSM6AAAAABDNBDUEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGYZTKMRVHE . You are receiving this because you commented.Message ID: @.***>
Well start is whenever we can get the money. End date would be Dec 31.
@mlieberman85 Has this been resolved? Should we close out this issue now if so?
@mlieberman85 Has this been resolved? Should we close out this issue now if so?
GUAC is looking to do a PoC with both maintainers of open source projects and end users into par due to a larger effort in the Security Toolbelt.
I spoke to @SecurityCRob that we don't currently have a mechanism for this but I'm ready to work through whatever we decide makes sense here.