testkube-api results/v1/config not found

fvogl commented 10 months ago

For testkube-api > 1.16.16 we get this error on the dashboard

tests:1 Access to fetch at 'https://testkube-api.ourdomain.com/results/v1/config' from origin 'https://testkube-dashboard.ourdomain.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
fetchUtils.ts:72 GET https://testkube-api.ourdomain.com/results/v1/config net::ERR_FAILED 404 (Not Found)

Boxes show just a loading animation

Note: I don't use ingress from the chart, but have my route for openshift created manually. Was working <= 1.16.16

vsukhin commented 10 months ago

@rangoo94 @ypoplavs anything wrong with ingress spec?

rangoo94 commented 10 months ago

I don't think our API server is returning Access-Control-* headers, it is added on the ingress level
@fvogl noted that it's some custom ingress, so unrelated to our ingress configuration too
The UI was always using fetch with default mode: cors, credentials: same-origin, there were no changes regarding that

@vsukhin, considering that, I don't think it's anything related to Testkube changes.

@fvogl, maybe you had some manual ingress changes in the OpenShift previously, that were not preserved after the upgrade?

rangoo94 commented 10 months ago

@fvogl, I've got a suspicion that I know what is wrong - is the testkube-api.ourdomain.com/results/v1/config the one that should be? I would rather assume that /results/v1 will be below the same domain as the Dashboard (what would solve CORS problem):

testkube.ourdomain.com pointing to Testkube Dashboard
testkube.ourdomain.com/results/v1 - pointing to Testkube API Server

If API will be under different domain, you need to add Access-Control-* headers to its ingress, allowing to access from the Dashboard's domain.

fvogl commented 10 months ago

@rangoo94 , the thing is that testkube was working perfectly fine until I've updated it to 1.16.20. When I change the testkube-api image back to 1.16.16 it works fine again. If I change it to anything > 1.16.16 I get what I reported.

rangoo94 commented 10 months ago

The only changes I see between 1.16.16 and 1.16.17 (here) are:

changes to internal pipeline
changes to the CLI
changes to executor images
Logs V2
libssl3 installed instead of libssl1.1 on the container
Fiber updated to v2.51.0, FastHTTP updated to v1.50.0, NATS libraries updated, gRPC updated to v1.60.0

Really nothing that could cause that.

@fvogl, could you go directly to that API URL manually (https://testkube-api.ourdomain.com/results/v1/config)? Maybe it's reporting error (i.e. 502), so the API server is down, or the ingress is not there, or ingress is not adding the Access-Control-* headers for 5xx?

fvogl commented 10 months ago

@rangoo94 , getting a json with 3 fields for 1.16.16 but "Cannot GET /results/v1/config" with 1.16.20

rangoo94 commented 10 months ago

@fvogl I see a few options:

Ingress is misconfigured and doesn't point to API Server
Ingress is misconfigured and is exposing API Server under different URL
API server is down for some reason

Could you please check API server logs?

fvogl commented 10 months ago

@rangoo94 , there is nothing in the logs.

I do have 2 different URLs for testkube-dashboard (with an oauth2-proxy wrapper for authentication) and testkube-api (without that).

API server is up - at least the POD is ready.

rangoo94 commented 10 months ago

That means that the ingress is misconfigured and doesn't point to the API Server. Otherwise, there would be a log, even in case of 404:

{"level":"debug","ts":1704294543.967603,"caller":"server/httpserver.go:45","msg":"request","method":"GET","path":"http://localhost:8088/v1/testing-unknown-url"}

fvogl commented 10 months ago

Getting this when accessing the api-server service from a POD within the cluster

/home/abc $ curl -vvv testkube-api-server.abc-testkube:8088/results/v1/config
* Host testkube-api-server.abc-testkube:8088 was resolved.
* IPv6: (none)
* IPv4: 172.30.179.124
*   Trying 172.30.179.124:8088...
* Connected to testkube-api-server.abc-testkube (172.30.179.124) port 8088
> GET /results/v1/config HTTP/1.1
> Host: testkube-api-server.abc-testkube:8088
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< date: Wed, 03 Jan 2024 15:26:38 GMT
< content-type: text/plain; charset=utf-8
< content-length: 29
< x-envoy-upstream-service-time: 0
< server: envoy
< 
* Connection #0 to host testkube-api-server.abc-testkube left intact
Cannot GET /results/v1/config

No log in the POD.

And this with version 1.16.16.

/home/abc $ curl -vvv testkube-api-server.abc-testkube:8088/results/v1/config
* Host testkube-api-server.abc-testkube:8088 was resolved.
* IPv6: (none)
* IPv4: 172.30.52.88
*   Trying 172.30.52.88:8088...
* Connected to testkube-api-server.abc-testkube (172.30.52.88) port 8088
> GET /results/v1/config HTTP/1.1
> Host: testkube-api-server.abc-testkube:8088
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< date: Wed, 03 Jan 2024 15:26:13 GMT
< content-type: application/json
< content-length: 86
< vary: Origin
< access-control-allow-origin: *
< x-envoy-upstream-service-time: 2
< server: envoy
< 
* Connection #0 to host testkube-api-server.abc-testkube left intact
{"id":"","clusterId":"cluster4a41c1aa8fe424b0c7e4ab54cb0a1c34","enableTelemetry":true}

and when I try to get a file there which does not exist I get the same error but also no log in the POD (guess I don't have loglevel as debug although I don't know how to change it)

/home/abc $ curl -vvv testkube-api-server.abc-testkube:8088/results/v1/xxx
* Host testkube-api-server.abc-testkube:8088 was resolved.
* IPv6: (none)
* IPv4: 172.30.52.88
*   Trying 172.30.52.88:8088...
* Connected to testkube-api-server.abc-testkube (172.30.52.88) port 8088
> GET /results/v1/xxx HTTP/1.1
> Host: testkube-api-server.abc-testkube:8088
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< date: Wed, 03 Jan 2024 15:31:28 GMT
< content-type: text/plain; charset=utf-8
< content-length: 26
< vary: Origin
< access-control-allow-origin: *
< x-envoy-upstream-service-time: 0
< server: envoy
< 
* Connection #0 to host testkube-api-server.abc-testkube left intact
Cannot GET /results/v1/xxx

Guess this rules out that it's an ingress issue as I've used the K8s service directly.

rangoo94 commented 10 months ago

Thanks @fvogl! I just noticed what was the reason:

In https://github.com/kubeshop/testkube/commit/b80ca88af6f75fbc4464e26d62aceef6ac28a2b5 the alias /results/v1 was replaced to be the only prefix by accident (earlier: /v1/ and - I think deprecated - /results/v1/)
In https://github.com/kubeshop/testkube/commit/fc71643c52f2f3f3a24707b028adadea4a037010 the /results group was deleted (leading to only /v1/ prefix being acceptable)

I would suggest changing the endpoint/ingress to point to testkube-api-server.abc-testkube:8088/v1 instead of testkube-api-server.abc-testkube:8088/results/v1.

@vsukhin, should we restore /results/v1 alias for backward compatibility, or we are deprecating it completely?

    // mount everything on results
    // TODO it should be named /api/ + dashboard refactor
    s.Mux.Mount("/results", s.Mux)

vsukhin commented 10 months ago

guess, it's @exu changes for Fiber upgrade, I would keep it or backward compatibility,

fvogl commented 10 months ago

@rangoo94 , I've just changed the config to not have the /results and it works again :)

exu commented 10 months ago

I would opt for removing this /results route for now. It introduces unnecessary complexity and confusion.

Looks like Fiber broke it's Mount api in recent version, and it's not so trivial to duplicate groups anymore.

exu commented 10 months ago

https://github.com/kubeshop/testkube/pull/4866/files

Added docs section to use / route instead - closing for now.

kubeshop / testkube

testkube-api results/v1/config not found #4850