jupyter / governance

The governance process and model for Project Jupyter
https://jupyter.org/governance/index.html
Creative Commons Zero v1.0 Universal
82 stars 71 forks source link

[proposal] Have a dedicated Jupyter emergency fund for gke.mybinder.org #125

Open choldgraf opened 2 years ago

choldgraf commented 2 years ago

Context

gke.mybinder.org is the largest member of the Binder federation. We have historically run it via credit donations from Google, and have scrambled to find credits from other stakeholders when these credits run out. Credits tend to come in 1-year batches with hard end-dates.

Because of the cyclical nature of credits, it creates a "credit crunch" where old credits may run out, but we have not yet found new credits for the infrastructure. This creates stressful situations where it's unclear how we'll pay for gke.mybinder.org. Most recently this happened in the issues below:

In these moments, individual stakeholders in the Binder project have stepped up to backstop gke.mybinder.org while we look for more credits, but this is a risky solution that depends on individual actors stepping up, and is a potential source of inequity amongst the Binder team members.

Instead, we should define a process that:

  1. Reduces the risk associated with running out of credits on mybinder.org
  2. Defines clear responsibilities for who must ensure that more credits are available to run the service

Proposal

As a first step, I propose that we set aside a dedicate account to backstop gke.mybinder.org. This account could be linked to the gke.mybinder.org Billing Account, so that whenever credits ran out, we would begin drawing from this account as a last resort. Our target would be to have at least 6 months of funding in the account at all times, to give us plenty of leeway if we need to find another round of credits or fundraise for it.

Note that most of the time, this funding would not be used - we still aim to power mybinder.org via credit allocations. This is just "gap funding" for when credits happen to run out.

How much cost are we talking about?

Historically, gke.mybinder.org costs around $7,000 per month (so, 6 months of funding would be roughly $42,000). However, we have recently undertaken several cost-saving measures, and believe that this is down to around $4,000 a month. So let's say $24,000 is a low-estimate for 6 months of usage. Ideally, we'd shoot for $50,000 in reserves if the funds were available, to give ourselves some breathing room.

Steps to implement this

I believe that we'd need to take the following steps:

choldgraf commented 2 years ago

cc @fperez @ellisonbg and @afshin who I think are the ones that recommended I open this issue / proposal. Please let me know if there's a different place that you'd like me to raise this issue.

meeseeksmachine commented 2 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/governance-office-hours-meeting-minutes/1480/179

Carreau commented 2 years ago

Sorry for the delay in responding,

I am in general +1 on the idea, I do have some question though.

minrk commented 2 years ago

We've been discussing some of these things over on JupyterHub threads. One of the biggest issues with the current setup is that we have "n=1" things that are hard to turn off (federation-redirect, main DNS, central analytics, events archive) plus the GKE BinderHub deployment which is the vast majority of the cost, in the same place. One thing we can do is move the "this must run" stuff to a separate deployment, so that it's easier to just "turn GKE off", which rapidly takes most costs to zero (ideally, we'd have a federation that can handle it, but even when we don't all federation members can now be 'full' with come-back-later messages). That way, we could be paying for the inexpensive always-on stuff (we can work on cost estimates) with steady donation income, and rely on grants/other support for the GKE federation member cluster which may need to shut down once in a while.

choldgraf commented 2 years ago

I'd like to see comparison on what those 50k could be used for.

In this case, these funds would only be used to pay for cloud costs with Binder if we ran out of other funding sources. This would be untouchable funding that cannot be used for other things. So if we are using these funds, it should be a question of "do we want mybinder.org to keep running or not".

In a separate conversation, I think we should find ways to raise revenue that supports ongoing operations and development of these services, but I am trying to scope this issue specifically for emergency purposes (like the one we are in now).

I'd really like to see a mitigation plan in case we are short on funds to dramatically reduce mybinder spending if the case arise

I would also love to see a plan like this. But I do not have the bandwidth nor skills right now to work on it. Do you see it as a blocker for devoting central Jupyter funds to support mybinder.org?

choldgraf commented 2 years ago

I've tried to clarify this proposal by renaming this to be "Emergency fund" instead of "Rainy day" fund - I think the original title may have been misleading to make people think it was a "nice to have" kind of purpose. I think that "emergency fund" makes it clearer that this is just for emergency purposes.

Carreau commented 2 years ago

So if we are using these funds, it should be a question of "do we want mybinder.org to keep running or not".

Sorry I was maybe unclear, this question was more "what could those funds be used for if they were not blocked for binder", it's literally assuming we have those 50k in the bank, is blocking them for binder preventing us from paying someone to do devops on nbviewer 8h per weeks for two years ?

As you point out later, we are all out of bandwidth, and if the choice is between paying cloud cost, and paying you to go raise some money, I most likely prefer the second one than the first one.

I don't have any opposition to Emergency/rainy day (I understood it the same). I want to better discuss the criticality of paying cloud cost vs people.

damianavila commented 2 years ago

That way, we could be paying for the inexpensive always-on stuff (we can work on cost estimates) with steady donation income, and rely on grants/other support for the GKE federation member cluster which may need to shut down once in a while.

This is an interesting model given the current money constraints and probably fits better with @Carreau thoughts around the paying cloud cost vs people discussion (because you have more degrees of freedom to decide if you either want to support the GKE cluster or allocate that money for development instead).