jsimonetti / talos-airgap

Collection of scripts to create a fully airgapped install of Talos
3 stars 0 forks source link

issue with running the modify.sh #1

Open samehif opened 4 years ago

samehif commented 4 years ago

root@sameh-host:~/talos-airgap# yq w -i -- init.yaml 'machine.files[+].content' "$(cat container-registry-ca.crt)" yq: -i/--in-place can only be used with -y/-Y

and if i use -y or -Y then i get this root@sameh-host:~/talos-airgap# yq w -i -y -- init.yaml 'machine.files[+].content' "$(cat container-registry-ca.crt)" yq: -i/--in-place can only be used with filename arguments, not on standard input

jsimonetti commented 4 years ago

Hi @samehif

Where did you install yq from? There are two applications that are named yq and offer similar functionality. These scripts use yq from https://github.com/mikefarah/yq

samehif commented 4 years ago

i installed using pip install

samehif commented 4 years ago

i reinstalled yq and not that step works. i get the following failure in the next step though: docker run --rm -d -it --name talos-deploy -v pwd/scripts:/content -e FOLDER=/content --ip 10.11.0.4 --net=talos-config_registry halverneus/static-file-server:latest

575f01564aefde560a058a0e485a7578ad439c9ffaa9f49b9a55e5d27e161745 docker: Error response from daemon: network talos-config_registry not found.

samehif commented 4 years ago

and if i skip this step and execute the next then i get this error: root@sameh-host:~/talos-airgap# osctl cluster create --name my-cluster --input-dir . --masters 1 --workers 2 --init-node-as-endpoint --registry-mirror docker.io=https://10.11.0.2:5000/docker.io --registry-mirror k8s.gcr.io=https://10.11.0.2:5000/k8s.gcr.io --registry-mirror quay.io=https://10.11.0.2:5000/quay.io

unknown flag: --registry-mirror

samehif commented 4 years ago

i modifed the line like this and now this step works: >docker run --rm -d -it --name talos-deploy -vpwd/scripts:/content -e FOLDER=/content --ip 10.11.0.4 --net=talosairgap_registry halverneus/static-file-server:latest

notice the network name was wrong. i got the correct one by running: docker network ls now i have tried to create the cluster without the --registry-mirror and it did create it and i can run for example the stats command like this:

>osctl --talosconfig talosconfig stats NODE NAMESPACE ID MEMORY(MB) CPU 10.5.0.2 system apid 4.06 72878151 10.5.0.2 system networkd 3.47 57931987 10.5.0.2 system osd 5.44 125811730 10.5.0.2 system trustd 3.18 46364381

however, i get an error when trying to get the kubeconfig file:

>osctl --talosconfig talosconfig kubeconfig . stat /etc/kubernetes/kubeconfig: no such file or directory error initializing gzip: EOF

Any idea???

jsimonetti commented 4 years ago

That last error is most of the time because talos is not yet finished booting k8s

samehif commented 4 years ago

the etcd is not starting and thats why it can't get the kubeconfig. i used a more recent version of talos (v0.4.1) where i could specify the registry-mirror but i get this error:

root@sameh-host:~/talos-airgap# ./talosctl4.1 cluster create --name my-cluster --input-dir . --masters 1 --workers 2 --init-node-as-endpoint --registry-mirror docker.io=https://10.11.0.2:5000/docker.io --registry-mirror k8s.gcr.io=https://10.11.0.2:5000/k8s.gcr.io --registry-mirror quay.io=https://10.11.0.2:5000/quay.io validating CIDR and reserving IPs downloading docker.io/autonomy/talos:v0.4.1 creating network my-cluster creating master nodes creating worker nodes waiting for etcd to be healthy: ... waiting for etcd to be healthy: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.5.0.2:50000: connect: connection refused" waiting for etcd to be healthy: service "etcd" not in expected state "Running": current state [Waiting] Waiting for service "networkd" to be "up" waiting for etcd to be healthy: service "etcd" not in expected state "Running": current state [Preparing] Running pre state

waiting for etcd to be healthy: service "etcd" not in expected state "Running": current state [Failed] Failed to run pre stage: failed to pull image "k8s.gcr.io/etcd:3.3.15-0": 2 error(s) occurred: failed to pull image "k8s.gcr.io/etcd:3.3.15-0": failed to resolve reference "k8s.gcr.io/etcd:3.3.15-0": failed to do request: Head "https://k8s.gcr.io/v2/etcd/manifests/3.3.15-0": proxyconnect tcp: dial tcp 10.11.0.3:8080: i/o timeout timeout

i have followed your intruction and i have a local registry running at 10.11.0.2 and a no_proxy env variable for that address but i dont understand why the create script is still not using this local registry and fail to pull the etcd images. Any help???

samehif commented 4 years ago

the problem seems to be that the --registry-mirror parameters are not considerd at all when the cluster is created from local files (using --input-dir .)

but there are still issue as talos is running in a different network than the registry. i have now configured talos to use the same network as the registry (10.11.0.0/24) but had to change the ip addresses of the registry and mitm proxy as talos always use x.x.x.2 to 10.11.0.11 and 10.11.0.12.

i now need to recreate the certificates for the registry with the new ip addresses. what command did you use (openssl ? ). do you have the exact command and parameters you used?

jsimonetti commented 4 years ago

You can follow a guide similar to https://support.citrix.com/article/CTX135602 to create certificates for your registry, et al.