ibm-mas / ansible-devops

Ansible collection supporting devops for IBM Maximo Application Suite
https://ibm-mas.github.io/ansible-devops/
Eclipse Public License 2.0
49 stars 84 forks source link

Installation of Maximo Core fails in an HCP(Hosted Control Plane) in phase suite-verify of the pipeline #1292

Closed gabyval closed 1 month ago

gabyval commented 5 months ago

Problem Description: Installation of Maximo Core fails in an HCP(Hosted Control Plane) in phase suite-verify of the pipeline Screenshot 2024-05-03 at 8 10 11 p m

Message:

SLS client registration was unsuccessful: Unable to register SLS client for MAS: An unhandled error was returned from SLS: Unable to register SLS client cpst-2fbc6c65: An unhandled error was returned from SLS: HTTPSConnectionPool(host=''sls.ibm-sls.ibm-sls.apps.cp4d-group-1.apps.rackm06.mydomain.com'', port=443): Max retries exceeded with url: /api/registrations (Caused by SSLError(SSLCertVerificationError(1, ''[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)'')))

Reproducibility: Consistent

Additional info:

Seems there might be a problem from the sls-api-licensing-5594b88bc8-cz5qv container. There are couple errors showing Startup probe failed Also, from inside the container, it is showing the following response:

bash-4.4$ curl -kv https://sls.ibm-sls.svc.cluster.local:9443/api/probe/liveness
*  Trying 172.21.249.120...
* TCP_NODELAY set
^C
bash-4.4$ curl -kv https://localhost:9443/api/probe/liveness
*  Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/pki/tls/certs/ca-bundle.crt
 CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=sls.ibm-sls.svc
* start date: Jan 27 02:12:13 2024 GMT
* expire date: Jan 22 02:12:13 2044 GMT
* issuer: C=GB; L=London; street=London; OU=IBM Suite License Service (Internal); CN=sls.sls.ibm.com
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55a464b75990)
> GET /api/probe/liveness HTTP/2
> Host: localhost:9443
> User-Agent: curl/7.61.1
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 404 
< x-powered-by: Servlet/4.0
< allow: OPTIONS
< date: Fri, 03 May 2024 20:40:39 GMT
< content-length: 0
< content-language: en-GB
< 
* Connection #0 to host localhost left intact

Trying to use curl with the external hostname:

curl -kfsS [https://sls.ibm-sls.ibm-sls.apps.cp4d-group-1.apps.rackm06.mydomain.com:443/api/probe/liveness](https://sls.ibm-sls.ibm-sls.apps.cp4d-group-1.apps.rackm06.mydomain.com/api/probe/liveness)
curl: (22) The requested URL returned error: 503 Service Unavailable

From the verbose, the following errors are presented:

<h1>Application is not available</h1>
   <p>The application is currently not serving requests at this endpoint. It may not have been started or is still starting.</p>

   <div class="alert alert-info">
    <p class="info">
     Possible reasons you are seeing this page:
    </p>
    <ul>
     <li>
      <strong>The host doesn't exist.</strong>
      Make sure the hostname was typed correctly and that a route matching this hostname exists.
     </li>
     <li>
      <strong>The host exists, but doesn't have a matching path.</strong>
      Check if the URL path was typed correctly and that the route was created using the desired path.
     </li>
     <li>
      <strong>Route and path matches, but all pods are down.</strong>
      Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
gabyval commented 5 months ago

Hi @durera, we are still blocked with this error, could someone help us?

durera commented 5 months ago

Discussing in Slack ...

whoiscnu commented 4 months ago

We implemented MAS on HCP with PrivateLink. See no issues during setup. We ensured before suite config, the public accessible routes were in place using LetsEncrypt Certs for MAS Core. There are intermittent errors wrt licensemediator pod while accessing MAS URL Administration option. Shall provide more details on that.

gabyval commented 4 months ago

FYI - slack thread https://ibm-ai-apps.slack.com/archives/CNDB5GZ3P/p1714684321087009

ralberrto commented 4 months ago

@whoiscnu I reattempted the installation on a clean cluster (refer to https://github.ibm.com/sit/cloudpak-storage-test/issues/1110#issuecomment-81431800)

The installation failed at the same spot. "suite verify". I think the reason it failed is the secret arft-sls-cfg was never creaed and the pod arft-catalogmgr is failing to mount it...

FranciscoOchoa94 commented 3 months ago

Hello ive found this issue without using HCPs, the cluster is behind a firewall:

cluster: https://console-openshift-console.apps.qa-hcp-sixpack.apps.blazehub01.rtp.raleigh.ibm.com/dashboards

suite-verify step failed i found this message:

message: 'SLS client registration was unsuccessful: Unable to register SLS client for MAS: An unhandled error was returned from SLS: Unable to register SLS client newqa-c11578da: An unhandled error was returned from SLS: HTTPSConnectionPool(host=''sls.ibm-sls.ibm-sls.apps.qa-hcp-sixpack.apps.blazehub01.rtp.raleigh.ibm.com'', port=443): Max retries exceeded with url: /api/registrations (Caused by SSLError(SSLCertVerificationError(1, ''[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)'')))'

faangbait commented 1 month ago

For what it's worth, we are successfully installing on HCP. Notably, we inject a ClusterIssuer resource named prod-route53-issuer in the cert-manager namespace (via hive syncset) and run with export MAS_CLUSTER_ISSUER=prod-route53-issuer

durera commented 1 month ago

Resolved this via e-mail & Slack, it was a mis-configuration in the install/environment; there's nothing specific to HCP preventing MAS installation on this flavour of AWS, we have successfully verified MAS install on HCP.