Open TachunLin opened 1 year ago
These are the initial concept, need further discussion and update by time.
sshuttle
or port forwarding
to expose the 192.168.2.0 subnet to other host machine Just come up the initial idea of the fully airgapped infrastructure diagram for further discussion
Just come up the initial idea of the fully airgapped infrastructure diagram for further discussion
I think in general this looks pretty good :smile: :+1:
I would only mention that there would be much more benefit to separating out hp-176
two run two vagrant vms - one to serve the registry and one to serve rancher initially instead of at a later point as the provisioning w/ Rancher-&-Docker-Registry has many flaws currently in ipxe-examples:
The Rancher instance, docker registry, DNS, name server implementation would be implemented in https://github.com/harvester/tests/issues/942
Another idea is we can consider moving the Artifact server
role from the external VM to inside the hp-176
seeder machine. This may decrease the effort to handle the network connectivity and could be better utilize the airgapped network created by the Open vSwitch.
There is a slight blocker at: https://github.com/harvester/harvester/issues/5301
Means we will need to bake in additional logic in the pipeline to compensate for that bug.
We're currently encountering something that we will need to redesign logic for. We're hitting:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
WorkflowScript: -1: Map expressions can only contain up to 125 entries @ line -1, column -1.
1 error
at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:309)
at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1107)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:624)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:602)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:579)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:323)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:293)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox$Scope.parse(GroovySandbox.java:163)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:190)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:175)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:635)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:581)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:335)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:442)
Finished: FAILURE
Seemingly related to something within Jenkins / Groovy :
Investigating....
Even pivoting, now we are hitting a limitation of the script string... Investigating
2024-07-23 00:05:09.979+0000 [id=26] SEVERE hudson.util.BootFailure#publish: Failed to initialize Jenkins
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
script: 280: String too long. The given string is 93362 Unicode code units long, but only a maximum of 65535 is allowed.
@ line 280, column 20.
script('''
^
1 error
at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:309)
at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1107)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:624)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:602)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:579)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:323)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:293)
at groovy.lang.GroovyShell.parseClass(GroovyShell.java:677)
at groovy.lang.GroovyShell.parse(GroovyShell.java:689)
at groovy.lang.GroovyShell$parse.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:128)
at javaposse.jobdsl.dsl.AbstractDslScriptLoader.parseScript(AbstractDslScriptLoader.groovy:134)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:210)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:59)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:157)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:177)
at javaposse.jobdsl.dsl.AbstractDslScriptLoader.runScriptEngine(AbstractDslScriptLoader.groovy:101)
Caused: javaposse.jobdsl.dsl.DslException: startup failed:
script: 280: String too long. The given string is 93362 Unicode code units long, but only a maximum of 65535 is allowed.
@ line 280, column 20.
script('''
Was able to reduce it, but still it is too big:
Caused: javaposse.jobdsl.dsl.DslException: startup failed:
script: 280: String too long. The given string is 71859 Unicode code units long, but only a maximum of 65535 is allowed.
@ line 280, column 20.
script('''
^
1 error
We'll need to pivot to something else w/ jobdls plugin...
Based on some more investigation, I'm not entirely sure all integrations can be within a single pipeline job... Still investigating...
Where the JobDSL plugin, possibly is calling:
readFileFromWorkspace()
in:
pipelineJob('example') {
definition {
cps {
script(readFileFromWorkspace('project-a-workflow.groovy'))
sandbox()
}
}
}
That it still ultimately in that, where we read a file... it seems to do from the jobdsl repo:
return filePath.readToString();
Ultimately, reading the file into a String, so we're back in the same place we would be even if we defined it as:
script(
'''
script in here
'''
)
^ because that also just yields a "String". So we can't rip it out into a file to escape the:
String too long. The given string is 71859 Unicode code units long, but only a maximum of 65535 is allowed.
Though... I'm not entirely sure about this. My initial thinking is that we would need to break this up into "multiple" pipeline jobs ... Example:
So that then scales, all our jobs from 1 (that provisions all integrations)... to needing to be "multiple"... one per airgap integration... possibly to just avoid this Groovy limitation of the string being too big :sweat_smile: ... Again, not entirely sure though....
With: https://github.com/irishgordo/harvester-baremetal-ansible/commit/b80e3dde3a36281bb7e861c5fb2c0956d66473f4 Was able to reduce it down so that the "String too large" error disappeared.
But still, the underlying error is now present as it's just seeing the script function as being too large in general...
Investigating the new error of:
Started by user [admin](http://172.19.98.192:8083/user/admin)
Running as [admin](http://172.19.98.192:8083/user/admin)
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
General error during class generation: Method too large: WorkflowScript.___cps___1 ()Lcom/cloudbees/groovy/cps/impl/CpsFunction;
groovyjarjarasm.asm.MethodTooLargeException: Method too large: WorkflowScript.___cps___1 ()Lcom/cloudbees/groovy/cps/impl/CpsFunction;
at groovyjarjarasm.asm.MethodWriter.computeMethodInfoSize(MethodWriter.java:2087)
at groovyjarjarasm.asm.ClassWriter.toByteArray(ClassWriter.java:447)
at org.codehaus.groovy.control.CompilationUnit$17.call(CompilationUnit.java:850)
at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1087)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:624)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:602)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:579)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:323)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:293)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox$Scope.parse(GroovySandbox.java:163)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:190)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:175)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:635)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:581)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:335)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:442)
1 error
at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:309)
at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1107)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:624)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:602)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:579)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:323)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:293)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox$Scope.parse(GroovySandbox.java:163)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:190)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:175)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:635)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:581)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:335)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:442)
Finished: FAILURE
Trying:
JAVA_OPTS: "-Dorg.jenkinsci.plugins.pipeline.modeldefinition.parser.RuntimeASTTransformer.SCRIPT_SPLITTING_TRANSFORMATION=true -Djenkins.install.runSetupWizard=false -Djenkins.install.SetupWizard.adminInitialApiToken=\"{{ lookup('password', '/dev/null length=20 chars=ascii_letters') }}\" -Dhudson.model.DirectoryBrowserSupport.CSP=\"\""
Specifically:
-Dorg.jenkinsci.plugins.pipeline.modeldefinition.parser.RuntimeASTTransformer.SCRIPT_SPLITTING_TRANSFORMATION=true
As suggested:
Has led to the same result... Pivoting to other solutions...
Timeboxing... Was trying variations of:
definition {
cpsScm {
scm {
git {
remote {
github('${harvester_baremetal_ansible_repo}', '${harvester_baremetal_ansible_branch}')
credentials('github-credential')
}
}
scriptPath("jenkins/harvester_airgap_integrations_pipeline.groovy")
}
}
}
It's really not working... getting the params.*
to come across the wire and be interpolated into things is just simply not working with the combinations of:
github('${harvester_baremetal_ansible_repo}', '${harvester_baremetal_ansible_branch}')
github("${harvester_baremetal_ansible_repo}", "${harvester_baremetal_ansible_branch}")
github("${params.harvester_baremetal_ansible_repo}", "${params.harvester_baremetal_ansible_branch}")
github('${harvester_baremetal_ansible_repo}', '${harvester_baremetal_ansible_branch}')
github('${params.harvester_baremetal_ansible_repo}', '${params.harvester_baremetal_ansible_branch}')
github($harvester_baremetal_ansible_repo, $harvater_baremetal_ansible_branch)
github("$harvester_baremetal_ansible_repo", "$harvater_baremetal_ansible_branch")
With the idea, we'd give a specific script path like:
scriptPath("jenkins/harvester_airgap_integrations_pipeline.groovy")
to split apart script / pipeline
But getting the branch & repo dynamically... through interpolation of the params (stringParam
) type...
May pivot back to cpsScm -> scm -> git & scriptPath...
The Jenkins JobDSL Plugin Docs + Jenkins Docs don't seem to have "dynamic" examples...
Re-investigating the environment. That would be the easiest... thinking that with some adjustments than: https://github.com/harvester/tests/issues/967#issuecomment-2243737049 ( that error )
It's difficult to overcome the "environment variable" limit... there are probably still some more ways around it...
Pivoted instead to:
With:
Methods, that get around the "method too large" error, since we pull then the logic of the multiple parallel running stages out into two separate methods -> one to build out the local.tfvars for the respective service, since we can't leverage the default environment TF_VAR_*
that Terraform gives us because Jenkins/Groovy is placing a strange limitation on the map size of the environment variables with the JobDSL plugin.
If we could, we'd avoid an entire parallel stage that's needed to build out the local.tfvars.
That leverages the second bullet point from:
What ended up working for interpolation and also syncing with the needed style of the local.tfvars for each service is using the $/
string...
def string = $/
string-goes-here
${params.interpolation}
other things like newline\n
/$
To seemingly help
Currently, testing pipeline on staging... Will iterate to fix any outstanding bugs as everything is now becoming glued together...
So, the temporary loop to do a few iterations when we shift the VM NIC/NAD and run a separate playbook for airgap seems to help buffer:
But the second iteration we're still seeing:
│ <172.19.121.147> (0, b'', b"OpenSSH_9.7p1, OpenSSL 3.3.1 4 Jun 2024\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 22: include /etc/ssh/ssh_config.d/*.conf matched no files\r\ndebug2: resolve_canonicalize: hostname 172.19.121.147 is address\r\ndebug1: auto-mux: Trying existing master at '/var/jenkins_home/.ansible/cp/fa3d4b2f87'\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 17473\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet_timeout: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n")
│ fatal: [dns-server-argp-vm]: FAILED! => {
│ "msg": "Timeout (12s) waiting for privilege escalation prompt: "
│ }
│
│ PLAY RECAP *********************************************************************
│ dns-server-argp-vm : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
│
│
│
│ with ansible_playbook.dns-vm-ansible-playbook,
│ on main.tf line 178, in resource "ansible_playbook" "dns-vm-ansible-playbook":
│ 178: resource "ansible_playbook" "dns-vm-ansible-playbook" {
│
│ ansible-playbook
Implementing an arbitrary timeout of sleep, prior to the next iteration so the VM can make the DHCP request outbound and get the IPv4 assigned regardless of network, will need to happen.
Something is happening and the /etc/rancher/k3s/registries.yaml
isn't getting the injected variable funneled in from Jenkins:
root@k3s-server-argp-vm:/home/ubuntu# cat /etc/rancher/k3s/registries.yaml
mirrors:
docker.io:
endpoint:
- "https://airgap-docker-registry..sslip.io:5000"
registry.suse.com:
endpoint:
- "https://airgap-docker-registry..sslip.io:5000"
configs:
"airgap-docker-registry..sslip.io:5000":
tls:
insecure_skip_verify: true
"https://airgap-docker-registry..sslip.io:5000":
tls:
insecure_skip_verify: true
"":
tls:
insecure_skip_verify: true
root@k3s-server-argp-vm:/home/ubuntu#
investigating...
This was an issue:
\"stderr\": \"Error: open /home/ubuntu/hauler-jetstack-cert-manager-images.yaml: no such file or directory\\nUsage:\\n hauler store sync [flags]\\n\\nFlags:\\n -f, --files strings Path(s) to local content files (Manifests). i.e. '--files ./rke2-files.yml\\n -h, --help help for sync\\n -k, --key string (Optional) Path to the key for signature verification\\n -p, --platform string (Optional) Specific platform to save. i.e. linux/amd64. Defaults to all if flag is omitted.\\n -c, --product-registry string (Optional) Specific Product Registry to use. Defaults to RGS Carbide Registry (rgcrprod.azurecr.us).\\n --products strings Used for RGS Carbide customers to supply a product and version and Hauler will retrieve the images. i.e. '--product rancher=v2.7.6'\\n -r, --registry string (Optional) Default pull registry for image refs that are not specifying a registry name.\\n\\nGlobal Flags:\\n --cache string (deprecated flag and currently not used)\\n -l, --log-level string (default \\\"info\\\")\\n -s, --store string Location to create store at (default \\\"store\\\")\",\n \"stderr_lines\": [\n \"Error: open /home/ubuntu/hauler-jetstack-cert-manager-images.yaml: no such file or directory\",\n \"Usage:\",\n \" hauler store sync [flags]\",\n \"\",\n \"Flags:\",\n \" -f, --files strings Path(s) to local content files (Manifests). i.e. '--files ./rke2-files.yml\",\n \" -h, --help help for sync\",\n \" -k, --key string (Optional) Path to the key for signature verification\",\n \" -p, --platform string (Optional) Specific platform to save. i.e. linux/amd64. Defaults to all if flag is omitted.\",\n \" -c, --product-registry string (Optional) Specific Product Registry to use. Defaults to RGS Carbide Registry (rgcrprod.azurecr.us).\",\n \" --products strings Used for RGS Carbide customers to supply a product and version and Hauler will retrieve the images. i.e. '--product rancher=v2.7.6'\",\n \" -r, --registry string (Optional) Default pull registry for image refs that are not specifying a registry name.\",\n \"\",\n \"Global Flags:\",\n \" --cache string (deprecated flag and currently not used)\",\n \" -l, --log-level string (default \\\"info\\\")\",\n \" -s, --store string Location to create store at (default \\\"store\\\")\"\n ],\n \"stdout\": \"\\u001b[90m2024-07-26 22:52:17\\u001b[0m \\u001b[1m\\u001b[31mERR\\u001b[0m\\u001b[0m open /home/ubuntu/hauler-jetstack-cert-manager-images.yaml: no such file or directory\",\n \"stdout_lines\": [\n \"\\u001b[90m2024-07-26 22:52:17\\u001b[0m \\u001b[1m\\u001b[31mERR\\u001b[0m\\u001b[0m open /home/ubuntu/hauler-jetstack-cert-manager-images.yaml: no such file or directory\"\n ]\n}\n\nTASK [seed-hauler : Print when errors] *****************************************\ntask path: /var/jenkins_home/workspace/harvester-airgap-integrations/terraform/airgap-integrations/hauler/ansible/roles/seed-hauler/tasks/main.yml:52\nok: [hauler-server-argp-vm] =\u003e {\n \"msg\": \"I caught an error in configuring vm further\"\n}\n\nTASK [seed-hauler : Always do this] ********************************************\ntask path: /var/jenkins_home/workspace/harvester-airgap-integrations/
Now fixed from Sunday's update. Yielding:
╭─mike at suse-workstation-team-harvester in ~/Projects/seeder/cmd/seeder on cli✘✘✘
╰─± curl -k https://172.19.121.240:5000/v2/library/nginx/tags/list | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 43 100 43 0 0 346 0 --:--:-- --:--:-- --:--:-- 349
{
"name": "library/nginx",
"tags": [
"latest"
]
}
╭─mike at suse-workstation-team-harvester in ~/Projects/seeder/cmd/seeder on cli✘✘✘
╰─± ./hauler store add image quay.io/jetstack/cert-manager-webhook:v1.13.1 -p linux/amd64
╭─mike at suse-workstation-team-harvester in ~/Projects/seeder/cmd/seeder on cli✘✘✘
╰─± curl -k https://airgap-docker-registry.172.19.121.240.sslip.io:5000/v2/jetstack/cert-manager-cainjector/tags/list | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 63 100 63 0 0 195 0 --:--:-- --:--:-- --:--:-- 195
{
"name": "jetstack/cert-manager-cainjector",
"tags": [
"v1.13.1"
]
}
So cert-manager & nginx are present.
Additionally all:
rescue:
"rescue" blocks in Ansible, will re-trigger an ansible.builtin.fail
with a message giving context at a glance, allowing the pipelines to not "fail silently"
While we are still waiting to implement Stage 7 & Stage 8 - we now have a new lab, where Stage 6's last part to allow for our Seeder to run AirGap can be a reality once more infrastructure work is done. cc: @TachunLin We may want to follow up as well for further optimizations on this, that would improve provisioning flow & timeline -> as in condensing the file-server to also leverage Hauler: https://github.com/zackbradys/rancher-airgap/blob/main/examples/rancher-airgap-quickstart.md -> vs. having a standalone
For reference, this was leveraged successfully, but outside of our lab env - to provision our needed integrations for v1.4.0 testing.
What's the test to develop? Please describe
The epic issue was created to track the progress of building a fully airgapped environment for the upgrade testing in each Harvester release candidate.
Since the current fully airgapped environment was built upon ipxe-example virtual machines inside another powerful virtual machine. This would usually cause some unexpected upgrade failure due to performance or resource bottleneck.
Scope
Prerequisite
Any prerequisite environment and pre-condition required for this test. Provide test case dependency here if any.
The fully airgapped environment requires the following components:
Test case reference
Roles
Harvester cluster on bare metal machines
On the same bare metal machine host VMs (tenative)
On the same VM (tenative)
Describe the items of the test development (DoD, definition of done) you'd like
Stage 1 Design discussion
Stage 2 Build Out Baseline Provision Harvester Airgap Pipeline
Stage 3 Convert Vagrant Logic To Terraform VMs (per service) For Harvester
Stage 4 Build Out All Airgap Integrations Jenkins Pipeline Utilities
Stage 5 Implement All Airgap Integrations Jenkins Pipeline on Staging
Stage 6 Implement additional pipeline parameters to baseline Harvester Airgap Pipeline
Stage 7 Move Both Pipelines To Prod
Stage 8 Implement New Pipeline To Run Subsection Of Tests Against Airgap