nodejs / build

Better build and test infra for Node.
504 stars 165 forks source link

Primary nodejs.org server out of space #1202

Closed rvagg closed 4 years ago

rvagg commented 6 years ago

I've made space by gzipping log files in https://nodejs.org/metrics/logs/ but that won't get us very far. Most of the space on the 1TB disk that we have mounted under /home/ are taken up in /home/dist/nodejs/ which looks like this:

95G chakracore-nightly
2.0G    chakracore-release
34M docs
4.0K    next-nightly
463G    nightly
42G rc
85G release
21G test
100G    v8-canary

Alternative here is to just add a bigger disk. I don't have any calculations on how quickly it'd take to fill up another 1TB.

Thoughts?

/cc @joaocgreis for thoughts on chakracore files /cc @MylesBorins @jasnell for thoughts on rc files

MylesBorins commented 6 years ago

I'd say keep the rc files, those have historical significance.

gibfahn commented 6 years ago

I'd say maybe go for a 2TB disk for now and see how it goes.

I don't think removing v8-canary, nightly and chakracore-nightly is a particular problem, but at the same time they might come in handy in the future, so maybe delay the decision for a bit.

joaocgreis commented 6 years ago

chakracore-release should be treated like release, and chakracore-nightly should be like nightly (and v8-canary).

I'd say that keeping nightlies for 1 year and tests for 1 month should be plenty.

Nightlies have the commit hash in the version and should be very straightforward to rebuild for anyone that needs them, the only advantage I see in keeping them for so long is to debug possible issues in our release infra. I don't think that's a pressing reason.

addaleax commented 6 years ago

I’m just going to mention that in the past, when bisecting some issue, the nightlies have been more than helpful a couple times as starting points for me.

This is especially relevant for Windows, where the build takes too long and makes any other approach essentially infeasible.

gibfahn commented 6 years ago

I’m just going to mention that in the past, when bisecting some issue, the nightlies have been more than helpful a couple times as starting points for me.

Yeah the bisecting reason was what I was thinking of. There's a really nice tool for Firefox called mozregression that makes bisecting regressions really easy (basically git bisect but it uses the built versions), and I was thinking that something similar for node might be helpful in the future.

MylesBorins commented 6 years ago

So this actually ended up blocking the v9.11.0 release as the disk complained of being out of space when I attempted to promote.

As I had already pushed the binaries I took it upon myself to delete some assets. this included

AFAICT none of these are controversial and do not affect any of the above workflows. The nightlies are still all in tact.

refack commented 6 years ago

Should we consider an more "archive" like storage, something like S3 or Azure Blob store? (or Google Cloud Storage obv).

rvagg commented 6 years ago

disk is full again, I'm going to take the primary server offline and try for a 2TB disk

MylesBorins commented 6 years ago

@rvagg seems odd that it would get full again so quickly, perhaps there is a deeper problem we need to look into.

I'll assume i should rebuild 9.11.1 before promoting... please lmk as soon as things are running again so i can build and get this release out the door. the longer 9.11.0 is out the longer people might actually use it!

rvagg commented 6 years ago

Copying data to new disk now, it's not a fast process but I'll ping you (@MylesBorins) when that's done and the server is ready again, yes you'll need to rebuild with Jenkins cause that's gong to have the same problem uploading the files until I replace the mount.

I'm not sure why it's filled up so quickly, I'm surprised that we even made it to 1TB this fast. Perhaps our nightlies are getting really fat, 100G v8-canary--we haven't been doing these for long--suggests that this might be the case.

I, or someone else with access, needs to do some inspection on our disk usage to come up with answers and then we're going to need a better strategy for dealing with this rather than just making ever larger disks whenever we hit this point.

MylesBorins commented 6 years ago

I can dig into it tomorrow a bit. It is getting late, it is currently midnight

@rvagg feel free to update the release PR to make you the releaser and move forward with the release if I'm not awake when things are done copying. TBH, I'd much rather us cleanup the current disk than transfer without doing more due diligence to ensure there isn't some weird stuff going on or inconsistencies.

I'm in #node-build right now, please ping me on there so we can discuss a bit more at length

rvagg commented 6 years ago

oh .. I know what happened now and it's my fault. I gzipped the metrics log files but the metrics log file generation script checks for the existence of the file before making a new one, so it's been busily replacing the log files I gzipped because it didn't look at the .gz extension. So essentially I freed up space and then the server decided to fill it right back up again. So that's why we're back at full so quickly.

jasnell commented 6 years ago

Fwiw, I did generate a new 10.0.0 test build today.

MylesBorins commented 6 years ago

@rvagg can we simply free up space and do the release for now, and then look into doing the transfer once we can do a bit more research and be guaranteed things won't get weird

jasnell commented 6 years ago

I'd say keep rcs, drop test and Canary after two weeks if we're having that much trouble with storage space

MylesBorins commented 6 years ago

I personally like @refack's solution of moving to a storage solution such as GCS or S3 (or perhaps both for redundancy. These can sit behind a CDN and we can even proxy traffic through our server if that is the best way to get logs.

rvagg commented 6 years ago

@MylesBorins you can run the release now if you're still around, I've got enough space on there now and will continue this transfer and get it switched over when it's ready.

rvagg commented 6 years ago

2TB disk is in place and active

Back to the original question, I take @addaleax and @gibfahn's point which suggests that nightlies and chakracore-nightlies are probably sacred. I think v8-canary and test might be up for grabs though, I've actually manually cleaned out test before (early on when I was probably the only one making test builds).

So, we're still going to have big block storage problems.

An object store would be nice, but we need access to one and we also need to integrate it. Rackspace and now DigitalOcean have their own and we have access to those--I haven't tested that we can actually use the DO one with our account but I'm assuming so--but I'm pretty sure that they're not going to be anything as nice as S3 for this kind of purpose. The NF still doesn't have a direct relationship with Amazon so we don't have any gratis access to their resources, so it'd have to be paid for if we really want S3. Then there's the question of integration. If we want to serve direct then we'd have to go with a subdomain, which could get very disruptive to users who expect the files to be located in a particular place (e.g. a scripted curl without -L). We may need to work out some kind of local proxy/mount situation so we can serve them through the same subdirectories but the files come off an object store. This would add latency but I'd assume that we'd have ditched our Cloudflare pass-through by then (still a TODO).

targos commented 6 years ago

Canary builds are very useful to track down regressions or perf improvements in V8 (see https://github.com/nodejs/node/issues/19769, for which I found a fix in just a few minutes). I think we should keep them one year.

mmarchini commented 6 years ago

FWIW v8-canary builds are also useful to track V8 commits with meaningful/breaking changes affecting llnode. It would be great if we could keep at least the last 6 months (preferably one year).

gdams commented 6 years ago

@rvagg can this be closed now?

github-actions[bot] commented 4 years ago

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mhdawson commented 4 years ago

I think this is long stale, closing. @rvagg let me know if you think that was not the right thing to do.