protocol / netops

Netops is a group of Infra Engineers, Software Developers, or any other Labber who is currently working on designing, deploying, or maintaining infrastructure within their project, working group, or special interest group. This repo is to organize common goods for all of NetOps.
MIT License
5 stars 1 forks source link

[General] data center machines unusable #66

Open vmx opened 1 year ago

vmx commented 1 year ago

What do you need?

I can't connect to most of the worker-gpu-* machines anymore. I was able to connect to worker-gpu-5, but that machine seems to have DNS/networking issues. The issue was confirmed by @cryptonemo. => They are not usable and I'm blocked as I can't do my work on those machines.

Why do you need it?

Please have a look and make sure that I can connect to those machines and that they don't have networking issues.

Who is the DRI?

@vmx and @cryptonemo

Team and command structure

FilCrypto

Estimated monthly cost

N/A

What else do we need to know?

That's all.

vmx commented 1 year ago

Update, I also cannot connect to miner-2 or worker-cpu-2-2. Though I can connect to worker-cpu-2-1. This seems to be a bigger data center issue.

vmx commented 1 year ago

I should also note that 24h ago things still worked as expected.

vmx commented 1 year ago

I still can't ssh to e.g. worker-gpu-6, but worker-gpu-5 doesn't have to seem networking issues anymore.

ognots commented 1 year ago

miner-2 should work now, I just rebooted it. it was wedged. I was able to connect to the following machines and validate your user exists

vmx commented 1 year ago

Thanks a lot!

  • worker-cpu-2-1
  • worker-cpu-2-2
  • worker-gpu-6
  • worker-gpu-5

I can now ssh to those above. Though I cannot ssh to:

vmx commented 1 year ago

The networking still isn't good. E.g. on woker-gpu-6 I need several retries to pull from GitHub. The error is something like:

fatal: unable to access 'https://github.com/filecoin-project/rust-fil-proofs/': Failed to connect to github.com port 443: No route to host
vmx commented 1 year ago

worker-cpu-2-2 has the same networking issues.

vmx commented 1 year ago

Any news? The worker-gpu-6 still has those networking issues.