Policy for third-party hardware donation

zimbatm commented 4 months ago

Sometimes, it's easier for organizations or individuals to lend out hardware (rather than open collective). There is an opportunity to gain access to build capacity. And different kinds of hardware (e.g., GPU, Riscv5, MIPS, ...)

Before pursuing this, let's discuss what that would look like.

What are the requirements on our side?

Some threads:

Some level of trust for the binary cache key
What channel of communication / API do we use to remote control the server? (Restart, KVM, ...)

zowoq commented 4 months ago

Anything exotic will be a problem for the hercules agent as it is haskell. Just as a remote builder for buildbot/hydra sould mean the cache key isn't an issue?

Mic92 commented 4 months ago

We would still need to trust the build results.

zowoq commented 4 months ago

We would still need to trust the build results.

I don't understand your point? Isn't trusting the build results a given?

Mic92 commented 4 months ago

We would still need to trust the build results.

I don't understand your point? Isn't trusting the build results a given?

I think we should communicate how builders for different architectures are secured. i.e. Hetzner will have a safer access policies than the machines in someones basement. Then users can decide if they are OK with this.

zimbatm commented 4 months ago

Remote builders sounds good.

One requirement could be that we are the only admins on the machine. It doesn't prevent physical tampering but reduces the attack surface if the host provider gets hacked.

ConnorBaker commented 4 months ago

I’m not familiar with how the infrastructure you’ve set up for builds vs caching works, but one of the concerns I’ve had when trying to stand up infrastructure for consistently building CUDA packages, and for serving a binary cache, but there can be a lot of traffic between nodes. It was enough of a bottleneck between the three desktops in my basement that I moved over to 10GBe networking for everything, and I’m still saturating it.

I don’t know if the remote build protocol takes into account closure size or data locality when deciding which machines should build different things, but there can be a lot of movement on the network, which can be a bottleneck or, in the case of cloud providers, a hefty egress fee.

So a couple of questions from me:

Do the machines you’ve set up for Hydra also serve as a cache? If they are, have you run into issues with the amount of egress you’re doing?
Is your Cachix the main binary cache for projects under this umbrella? If so, how large is it (if you’re willing or able to disclose)?
Are the machines working as hydra builders all on the same network (or Hetzner region)?
What are your thoughts on ephemeral Azure builders? I’ve found the HBv3 spot instances in US East extremely competitive price-wise, and have been looking into tooling to automate scaling them up and down.
File-level deduplication in the cache would be fairly important, as CUDA enablement involves realizing a new copy of a derivation rather than building on an existing one —- most projects don’t separate the code generation for CPUs from GPUs well enough for us to be able to re-use existing portions of builds. Is there anything like that set up currently?

I’d love to learn more about any of the challenges you all have faced setting up and maintaining this infrastructure!

zowoq commented 4 months ago

remote build protocol takes into account closure size or data locality when deciding which machines should build different things

No, it doesn't.

Do the machines you’ve set up for Hydra also serve as a cache?

No.

Is your Cachix the main binary cache for projects under this umbrella? If so, how large is it (if you’re willing or able to disclose)?

Yes. 1tb, sponsored by Cachix.

Are the machines working as hydra builders all on the same network (or Hetzner region)?

Yes, our linux machines are all in HEL1. Though that isn't exactly intentional, we usually choose based on price. We also have two macos builders in FSN1.

What are your thoughts on ephemeral Azure builders? I’ve found the HBv3 spot instances in US East extremely competitive price-wise, and have been looking into tooling to automate scaling them up and down.

Haven't used them. IIRC there way some discussion about Azure stuff in the nixos org, maybe with the infra team or the foundation?

File-level deduplication in the cache would be fairly important, as CUDA enablement involves realizing a new copy of a derivation rather than building on an existing one —- most projects don’t separate the code generation for CPUs from GPUs well enough for us to be able to re-use existing portions of builds. Is there anything like that set up currently?

No.

I’d love to learn more about any of the challenges you all have faced setting up and maintaining this infrastructure!

I don't think we've had any real technical challenges so far, basically just limited by funding. Once we started the opencollective we just expanded as the funding increased. Building cuda, rocm, etc is probably going to be the first time we've really needed to give thought to some of these topics.

zimbatm commented 4 months ago

Yes, our linux machines are all in HEL1.

See also https://docs.hetzner.com/robot/general/traffic/ Each server comes with 10TB of egress.

Mic92 commented 4 months ago

Yes, our linux machines are all in HEL1.

See also https://docs.hetzner.com/robot/general/traffic/ Each server comes with 10TB of egress.

No, for 1Gbit-Links and physical machines, it's unmetered. The traffic limit only applies to VMs.

zimbatm commented 4 months ago

To go back on topic, I am recapping this to:

We happily accept if you want to donate hardware to extend our build configuration.
The machine needs to be connected to the Internet, with SSH access.
We ask that we control the system configuration (so it's part of the nix-community/infra repo).

It could be added to the donation page if we agree on this.

As a personal note, I would love to see GPU, and MIPS hardware

zimbatm commented 4 months ago

In case of esotheric hardware, we also need one or more people doing the work to fix nixpkgs so that we can keep the machine up to date.

Mic92 commented 4 months ago

Mips seems a bit dead at this point, even https://mips.com/ now sells your riscv64 cores instead. Some simple gpu runner would be nice for integration tests.

zimbatm commented 3 months ago

ConnorBaker commented 3 months ago

Continuing from https://github.com/nix-community/infra/pull/1335#issuecomment-2211479450:

@zowoq sorry for the late reply, I'm moving soon and organizing everything has been quite an experience.

Hetzner

A more accurate amount on the Hetzner side: 315 euros / month (though it could be less, at the moment).

I have one RX170 instance (hereafter ubuntu-hetzner) which I've set up as an aarch64-linux builder for our Hercules CI instance. I also use it regularly for nixpkgs-review runs. That is about 169 euros / month.

I have one AX102 instance (hereafter nixos-cantcache-me) which I use to host https://cantcache.me, an attic instance with a cache for CUDA-related work (https://cantcache.me/cuda/nix-cache-info). I use it as a cache for my nixpkgs-review runs, as I'm planning on moving from my desktop builders to ephemeral instances on Azure, and a large amount can be cached between CI runs. This costs about 104 euros / month, plus about 43 euros / month for the 10Gb uplink.

I only recently set up monitoring on nixos-cantcache-me (which runs NixOS), but haven't yet for ubuntu-hetzner (which runs Ubuntu). Utilization wise, nixos-cantcache-me is only under load when I'm running nixpkgs-review or when Hercules CI is running on my desktop builders (described below). ubuntu-hetzner is under load when I'm running nixpkgs-review or when Hercules CI is running on it.

Hercules CI runs on a schedule, so load increases due to CI starting and load decreases due to CI finishing are somewhat predictable (meaning we could benefit from ephemeral instances).

Local machines

My desktop builders are three machines I have in my basement, and they are all running NixOS. I use them every day for work on Nixpkgs (lots of nixpkgs-review runs) or hobby projects, and run Hercules CI on them.

Common specs:

They're networked directly through a 10 Gbe switch (https://www.zyxel.com/global/en/products/switch/10-12-port-10g-multi-gigabit-lite-l3-smart-managed-switch-xs1930-series/specifications).
Each has either onboard 10 Gbe or a 10 Gbe NIC.
Each has 96 GB of DDR5-6800 RAM.
Each has ZRAM enabled with ZSTD compression (avoiding pesky out-of-memory when building PyTorch, among other things, which allocates a huge number of empty pages when compiling flash attention).
Each has all drives in a ZFS RAID 0 array, optimized for speed.

nixos-desktop
- Intel i9-13900k
- ZOTAC GAMING GeForce RTX 4090 Trinity OC
- ~4.53 TB disk space
- 1 x Samsung 990 PRO 1 TB
- 2 x Samsung 980 PRO 2 TB
nixos-build01
- AMD Ryzen 9 7950X3D
- ~7.25 TB disk space
- 4 x Samsung 990 PRO 2 TB
nixos-ext
- AMD Ryzen 9 7950X3D
- ~7.25 TB disk space
- 4 x Samsung 990 PRO 2 TB

Questions

Since the majority of the load on these machines are my running nixpkgs-review (Hercules CI builds a very limited subset of packages), I'm curious: what do you all use to validate PRs? Is there some common infrastructure for nixpkgs-review runs?

As an example, for each of my PRs, I try to run the following nixpkgs-review instances:

x86_64-linux with { allowUnfree = false; cudaSupport = false; }
x86_64-linux with { allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "7.5" ]; }
aarch64-linux with { allowUnfree = false; cudaSupport = false; }
aarch64-linux with { allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "7.5" ]; }
aarch64-linux with { allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "7.2" ]; }
- This is to test support for Jetson devices
x86_64-darwin with { allowUnfree = false; cudaSupport = false; }
aarch64-darwin with { allowUnfree = false; cudaSupport = false; }

That's seven instances: luckily I use a MacBook Pro for as my laptop and have a Jetson, so I can build for both those platforms. But I'm curious how others handle this!

I built all my local machines because I need them to speed up nixpkgs-review runs; I'd love to contribute them to the community, so everyone can benefit from them, but I'm moving and not sure what to do with them physically.

SomeoneSerge commented 3 months ago

Remote builders sounds good.

H'mm, the first thing I disliked about cuda-maintainers.cachix.org and the reason I have been reluctant to advertise it is we couldn't publish a clear answer to the question "who has access to the signing keys?" ("who can push to cachix?"). The cloud reliance is surely unnecessary, expensive, and annoying, but makes the billing easy and alleviates the need to expose the keys to more than a few parties.

What I'd like as a consumer is for somebody foundation/"association"/"community"-aligned to maintain a physical build farm that one could physically donate hardware to (by parcel) with a transparent policy wrt keys and isolation on the builders ("by consuming from this cache you trust the nvidia kernel modules that run on some of the builders, you trust people from this list who can ssh, you trust this nation state in whose jurisdiction the farm is hosted"). AFAICT nobody on the CUDA team currently has the capacity to spin up something like that

Mic92 commented 3 months ago

Not sure we would actually need a 10G uplink. I think quite a few stuff in the NixOS infra also works fine with 1G. I think an upgrade of one of our NixOS builder on x86 to an AX162-R, might give us enough horse power.

zowoq commented 3 months ago

I think an upgrade of one of our NixOS builder on x86 to an AX162-R, might give us enough horse power.

Did you read my comment?

https://github.com/nix-community/infra/pull/1340#issuecomment-2229522725 Don't really need more resources, these builds are heavy while they are running but don't add that much overall load. The issue is that for the few hours while they are building they are competing for resources with the buildbot builds.

zimbatm commented 3 months ago

The issue is that for the few hours while they are building they are competing for resources with the buildbot builds.

This sounds like adding more hardware, while being a bit wasteful, would also fix the issue. Or is there something else blocking?

zowoq commented 3 months ago

The issue is that for the few hours while they are building they are competing for resources with the buildbot builds.

This sounds like adding more hardware, while being a bit wasteful, would also fix the issue. Or is there something else blocking?

We discussed this a few days ago.

zowoq commented 3 months ago

AX162-R

@Mic92 These seem to have a long wait time (weeks/months) and possibly some stability issues (according to reddit). WDTY about a server auction EPYC 7502P (32 cores), 256gb, 2x1.92tb?

I'd propose doing a hardware upgrade/shuffle like we did last year: cancel ax41 (build01), move build03 -> build01, add the EPYC as build03.

Mic92 commented 3 months ago

Zen3 CPUs also sounds fine. What is the price point?

zowoq commented 3 months ago

Slightly more expensive than the ax162.

7502p

zowoq commented 3 months ago

Zen3 CPUs also sounds fine.

Same amount of ram as we have currently and 4 extra cores?

Mic92 commented 3 months ago

We currently have 12 cores in build03, so it's 20 extra cores.

Mic92 commented 3 months ago

It actually shows here that ax162-r would be available in a few minutes if we order in Germany: https://www.hetzner.com/dedicated-rootserver/ax162-r/configurator/#/check-availability and it's still cheaper than ax162 while giving us 48 cores and ddr5

zowoq commented 3 months ago

We currently have 12 cores in build03, so it's 20 extra cores.

Oops, yes, I mixed up looking at the lower spec amds.

It actually shows here that ax162-r would be available in a few minutes if we order in Germany

Are you sure it will actually be available in a few minutes?

Due to a tense delivery situation for hardware for the dedicated server AX162-R/S, there may be delays in deployment. Thank you for your understanding!

order

Mic92 commented 3 months ago

No. Not sure, but do we loose much if we have to wait a bit? The performance difference looks significant for the same price. I am wondering where the "available in a few minutes" comes from then. I would expect this to be an automated process looking at the duration.

nix-community / infra