Error deploying Hashed Collator to k8s

mgravitt commented 1 year ago

I am trying to deploy the feature/hashed-chain-spec branch to k8s. I am using the same configuration that was used for MD5 Network.

The errors are:

Node scale up in zones us-east4-c associated with this pod failed: GCE out of resources. Pod is at risk of not being scheduled.

and

0/2 nodes are available: 2 Insufficient cpu.

I attempted to add a new Node Pool using Google Cloud, but it didn't have any impact.

I also tried changing the CPU and RAM between 3G and 1.5 CPU to 16G and 4 CPU with no impact.

Command to reproduce is:

kubectl apply -f k8-manifests/collator-manifest.yml

We should do the same for Luhn Network. The only configuration differences are:

use --chain luhn instead of --chain hashed
the names of objects in k8s have prefix of hashed instead of md5 (except for the namespace, which stays hashed-network unless there is a desire to create new namespaces for luhn and md5)

Each collator should be deployed twice, with DNS entries as:

c1.hashed.network
c2.hashed.network
c1.luhn.network
c2.luhn.network

Once deployed, the RPCs and chain state will be up and available, but of course it won't build blocks until it has a parachain lease. When we get close to that time, we can inject the keys required for block signing.

In the meantime, we will continue to build and we can upgrade the nodes and our runtime on Polkadot. However, there is a transaction cost of 100 DOT per on-chain upgrade so we need to be judicious. CC @tlacloc @didiermis

sebastianmontero commented 1 year ago

@max I don't see the file: k8-manifests/collator-deployment.yml, I see: k8-manifests/collator-manifest.yml is this the one you are trying to deploy?

mgravitt commented 1 year ago

@sebastianmontero yes, that's right. It is failing when creating the deployment, but that command was a typo; it was copied from a file where I was I debugging. I corrected.

sebastianmontero commented 1 year ago

@3yekn is there a way to open a ticket with google? it seems that the issue is "current unavailability of a Compute Engine resource, for example GPUs or CPUs in the requested zone" this is why the node pool is not scaling up, I tried creating a node pool in another zone, but the cluster was created with no nodes. I guess this is a temporary issue, but not sure.

mgravitt commented 1 year ago

If you are logged in and follow through to support and chat, it would route to an agent that will either be able to help or will escalate it to someone else. On Nov 28, 2022, at 5:15 PM, sebastianmontero @.***> wrote: @3yekn is there a way to open a ticket with google? it seems that the issue is "current unavailability of a Compute Engine resource, for example GPUs or CPUs in the requested zone" this is why the node pool is not scaling up, I tried creating a node pool in another zone, but the cluster was created with no nodes. I guess this is a temporary issue, but not sure.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

sebastianmontero commented 1 year ago

@3yekn Yes, I've tried it but seems I can only access billing support:

sebastianmontero commented 1 year ago

@3yekn the chain argument does not seem to be working both are being deployed as luhn, which docker file did you use to build the image?

hashed-io / hashed-substrate

Error deploying Hashed Collator to k8s #273