binder.pangeo.io shut down due to crypto mining

rabernat commented 2 years ago

A few weeks ago we started seeing these notices from Google CLoud

Dear Developer,

Our systems identified that your Google Cloud Platform / API Project ID pangeo (id: pangeo-181919) may have been compromised and used for cryptocurrency mining.

This activity was detected as originating from IP 34.134.139.27, 35.202.205.80, 34.122.114.163, 34.132.191.152, 35.202.175.45, 34.133.126.79, 34.123.101.98, 35.193.3.130, 34.132.119.240, 34.71.241.100, 34.67.130.14, 34.71.90.24, 34.135.253.5, 34.72.61.75, 35.222.246.237, 34.134.57.213, 34.133.118.37, 34.133.150.175, 34.67.43.143, 34.68.227.209, 35.232.249.88, 34.132.212.244, 34.132.10.154, 35.184.190.167, 35.224.43.11, 35.184.16.35, 34.72.110.84, 34.68.128.37, 104.154.53.176, 34.135.132.226, 34.71.220.222, 34.121.209.232, 104.154.77.62, 34.132.152.105, 34.136.19.208, 35.238.176.255, 35.192.104.29, 35.232.158.48, 34.123.200.254, 35.193.162.116, 34.70.19.67, 35.202.170.215, 35.226.59.255, 34.132.160.123, 35.184.88.34, 34.68.28.248, 34.67.150.235 and VM ID 7204308034468923890:us-central1-b 6154726994941753177:us-central1-b 23495819365936574:us-central1-b 871060581044631402:us-central1-b 9056366196714287444:us-central1-b 2078833108693727082:us-central1-b 2379172320502788252:us-central1-b 2033074746165521335:us-central1-b 6614086151114555388:us-central1-b 6481841045759038865:us-central1-b 8060108563441484120:us-central1-b 1930093761258502077:us-central1-b 6917237063263891611:us-central1-b 1769396228438798560:us-central1-b 1152664557098022744:us-central1-b 1368015935306407708:us-central1-b 8470523882921731892:us-central1-b 3453670015797211952:us-central1-b 7477268612177377130:us-central1-b 4230094429538660508:us-central1-b 7516664448808039216:us-central1-b 6658613447166542524:us-central1-b 2719659052643825153:us-central1-b 9025766762343476985:us-central1-b 3824956035727274209:us-central1-b 3042387495175264580:us-central1-b 2566597537884714993:us-central1-b 4099334863176513368:us-central1-b 1453326668843364089:us-central1-b 6008087590161573013:us-central1-b 4157158282760470131:us-central1-b 3620565006036418163:us-central1-b 2777884811879213922:us-central1-b 4826068438038551749:us-central1-b 2366113575518516063:us-central1-b 7870704371158398748:us-central1-b 3614756977596609295:us-central1-b 3603490415090774185:us-central1-b 1726988374193171917:us-central1-b 6533022912599996248:us-central1-b 85662215390327083:us-central1-b 8669231498353883121:us-central1-b 4296773947132917694:us-central1-b 7896205443125692539:us-central1-b 7867110535798659555:us-central1-b 1955607456134639840:us-central1-b 6190384449088303272:us-central1-b to destination IP 51.79.251.11 on remote port 3300 between 2021-12-07 01:59 and 2021-12-07 02:25 (Pacific Time), though it may still be ongoing. We recommend that you review this activity to determine if it is intended. Cryptocurrency mining is often an indication of the use of fraudulent accounts and payment instruments, and we require verification in order to mine cryptocurrency on our platform. Additional information is available in the Cloud Security Help Center(support.google.com/cloud/answer/6262505). If you believe your project has been compromised, we recommend that you secure all your instances (https://support.google.com/cloud/answer/6262505), which may require uninstalling and then re-installing your project. To better protect your organization from misconfiguration and access the best of Google's threat detection, you may consider enabling Security Command Center (SCC) for your organization. To learn more about SCC visit https://cloud.google.com/security-command-center. Once you have fixed the issue, please respond to this email. If the behavior is intentional, please explain so that we do not ping you again for this activity. Please do not hesitate to reach out to us if you have questions.

Crypto mining is a common problem on binder deployments, and it has finally hit us.

I ignored it for a while. We currently have no sysadmin for the binder cluster. It is running totally unsupervised. However, I recently checked the logs and noticed a huge spike in usage:

The binder has been in maxed-out state for quite a while, and is on track to cost thousands of more dollars per month than we are used to.

Resolution

I needed to try to shut down the binder as fast as possible. Unfortunately, my kubernetes / helm skills are very rusty. Here's what I tried:

First I updated my local gcloud and helm to latest versions (haven't touched either in over a year). Then

gcloud auth login
gcloud container clusters get-credentials binder

Then I tried helm

$ helm init
WARNING: "kubernetes-charts.storage.googleapis.com" is deprecated for "stable" and will be deleted Nov. 13, 2020.
WARNING: You should switch to "https://charts.helm.sh/stable"
$HELM_HOME has been configured at /Users/rpa/.helm.
Warning: Tiller is already installed in the cluster.
(Use --client-only to suppress this message, or --upgrade to upgrade Tiller to the current version.)

$ helm status
WARNING: "kubernetes-charts.storage.googleapis.com" is deprecated for "stable" and will be deleted Nov. 13, 2020.
WARNING: You should switch to "https://charts.helm.sh/stable"
Error: could not find a ready tiller pod

I couldn't get anything useful out of helm, so I gave up on it.

Then I went to kubernetes

kubectl delete --all pods --namespace prod
kubectl delete --all pods --namespace staging

This had no apparent effect.

So then I went to https://console.cloud.google.com/kubernetes/clusters/details/us-central1-b/binder/nodes?project=pangeo-181919 and manually resized all the node pools to zero. That seemed to work.

So the binder is currently completely broken and unusable. 😞 It would be nice to at least get a landing page up at binder.pangeo.io that explains the situation. I am worried that this will affect activities planned for AGU, but I'm not sure.

I think our best hope for reviving our binder would be when 2i2c can take on this deployment, but that likely won't happen until spring.

consideRatio commented 2 years ago

@rabernat you seem to use an outdated version of helm, and you won't have a lot of the errors you see if you upgrade to helm version 3.

So then I went to https://console.cloud.google.com/kubernetes/clusters/details/us-central1-b/binder/nodes?project=pangeo-181919 and manually resized all the node pools to zero. That seemed to work.

Nice!

I'm so upset about having to invest time and effort in this... Note that this is a related discussion: https://github.com/jupyterhub/team-compass/issues/478. Also note https://twitter.com/GeoffreyHuntley/status/1468040448316882946 as Geoff links to from that post.

jacobtomlinson commented 2 years ago

So sorry to hear the platform has been abused! I'm afraid I don't have time or resource to help get things back up, but if you ever need help in a crunch moment like you had to try to get things down you can always call on me!

If you have the ability to update the DNS for binder.pangeo.io I would create a GitHub repo with a simple landing page and use GitHub Pages to host.

ltetrel commented 2 years ago

So sorry to hear that :( There should be a way to manually label nodes as NotReady in the case you want to keep your deployment alive ? Or just un-installing binderhub following this: https://binderhub.readthedocs.io/en/latest/zero-to-binderhub/turn-off.html

scottyhq commented 2 years ago

Just noting that the AWS pangeo binder is still up. It uses github authentication, so we've thus far mostly avoided cryptominers (#188). Although the pangeo aws infrastructure is no longer supported and could disappear at any moment as well :(

I made this table on discourse a while back. Depending on your CPU, RAM, dask, and data location needs one of these alternatives might work...

https://discourse.pangeo.io/t/jupytext-for-version-controlling-jupyter-notebooks-on-a-binder/1589/6?u=scottyhq

BinderHub	vCPU	RAM (GB)	Cloud provider	Max Session (hr)	Dask-gateway
~~binder.pangeo.io~~	4	8	Google us-central1	3	yes
aws-uswest2-binder.pangeo.io	4	8	AWS us-west-2	3	yes
gke.mybinder.org	1	2	Google us-central1	6	no
ovh.mybinder.org	1	2	OVH ?	?	no
gesis.mybinder.org	2	8	Custom Server	6	no

choldgraf commented 2 years ago

Just a quick note that, as @consideRatio mentions, the Binder team is thinking through these issues as well, and hopes to have a meeting with some others in the community to discuss potential ways around this: https://github.com/jupyterhub/team-compass/issues/478

There are plans for 2i2c to take over operations of the Pangeo Binder, we'll need to figure out our strategy around crypto mining then (otherwise this is going to be a constant source of extra labor). That's a conversation that should include leaders from the Pangeo world, since it might involve trade-offs about user experience vs. constraints for mining.

rabernat commented 2 years ago

Thanks Chris! We would be fine with simply requiring sign-in to use our binder.

sgibson91 commented 2 years ago

We would be fine with simply requiring sign-in to use our binder.

Just a heads up that the word from the folk who run GESIS Notebooks is that they still have issues with crypto-mining even with authentication, and they are actually shutting down the auth'd side of that service at the end of this month. So we will probably need auth and something else (maybe recaptcha?) to really tackle this. Hence the meeting that is taking place in the new year (since ideally we would like to avoid putting auth in front of mybinder.org)

pangeo-data / pangeo-binder

binder.pangeo.io shut down due to crypto mining #195

Resolution