helm / chartmuseum

helm chart repository server
https://chartmuseum.com
Apache License 2.0
3.6k stars 401 forks source link

index.yaml not always regenerated when pushing helm charts via API - 404s #453

Open peppe77 opened 3 years ago

peppe77 commented 3 years ago

K8S: 1.18 Chartmuseum: v0.13.1 (no cache-interval configured)

        - name: DISABLE_METRICS
          value: "false"
        - name: DISABLE_API
          value: "false"
        - name: ALLOW_OVERWRITE
          value: "true"
        - name: CACHE
          value: redis
GET /ccdp/stable/api/charts/ccdp-om-inbound-service/release-1.61.0-dev699051 
HTTP/1.1 404 Not Found
Date: Tue, 27 Apr 2021 22:21:32 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 57
Connection: keep-alive
X-Request-Id: dc61e2062300687cbae84e6fc9f744a6
Strict-Transport-Security: max-age=15724800; includeSubDomains
{"error":"improper constraint: release-1.61.0-dev699051"}

Though when we checked in the S3 bucket under /ccdp/stable, we do find the helm chart [ccdp-om-inbound-service-release-1.61.0-dev699051]. Upload always seems to work though not always index.yaml gets regenerated. Long story short, if a 404 is returned then we do not deploy (not in automated manner).

We have tried to delete index-cache and restart chartmuseum - problem continues to occur!

jdolitsky commented 3 years ago

Initial reaction is that this is related to the version of your chart not being valid semver2

{"error":"improper constraint: release-1.61.0-dev699051"}

This error is triggered by this code: https://github.com/Masterminds/semver/blob/49c09bfed6adcffa16482ddc5e5588cffff9883a/constraints.go#L32

which is referenced all over in helm/helm

modify the version from release-1.61.0-dev699051 to 1.61.0-dev699051 (or something else starting with valid semver) and I bet this issue will go away

peppe77 commented 3 years ago

@jdolitsky it was a bad example. The logic looks for 1.61.0-dev699051, 1.61.0, 1.161.0-dev and also release-1.61.0-dev699051(this is only case a 404 should be returned, right?). So, will reproduce and then do a get following this pattern [1.61.0-dev, 1.61.0, 1.161.0-dev ] and report back.

peppe77 commented 3 years ago

@jdolitsky here it is

404
GET /api/ccdp/stable/charts/ccdp-best-match/1.62.0-dev699615 HTTP/1.1
HTTP/1.1 404 Not Found
Date: Wed, 28 Apr 2021 00:22:46 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 71
Connection: keep-alive
X-Request-Id: 2ad82bc1b3ca4e1c5ef1e2147240e08d
Strict-Transport-Security: max-age=15724800; includeSubDomains
{"error":"no chart version found for ccdp-best-match-1.62.0-dev699615"}

However, when we checked in the S3 bucket - here is the chart. ccdp-best-match-1.62.0-dev699615.tgz | tgz | April 27, 2021, 10:54:45 (UTC-07:00) | 8.5 KB | Standard

Any suggestions/ideas on how to narrow it down? This was 6+ hours ago and index still does not have it. The puzzling part is that it is a random problem not specific to any set/given charts. We have seen it hitting quite a few already since we upgraded to v0.13.1

peppe77 commented 3 years ago

Just wanted to add that once helm chart is deleted, also via API, sometimes index is also not regenerated (so, it exists in the index but no longer in the S3 bucket). Same patter as originally described for the chart push to repo (above).

peppe77 commented 3 years ago

@jdolitsky this has been causing a lot of problems for us, therefore we will downgrade to v0.12.0 (we used to be on v0.8.3). It is not great move because the performance bug that also used to impact us got fixed in v0.13.0 though it is the only solution we see as to get index regenerated (we only use API) when uploading or deleting charts. If more logs, info and/or data is needed - pls let us know asap. thanks

scbizu commented 3 years ago

@peppe77 The index generator will work immediately after you push your chart . Can you somehow turn off the redis external cache and try if it works for you ?

peppe77 commented 3 years ago

@scbizu wee have rolled back to v0.8.3 and have not seen a single issue (this is version we used to have before we had it upgraded to v0.13.1). v0.13.1 we ad lot of problems on nightly builds (high concurrency towards chartmuseum) where charts are uploaded to S3 but some never makes to index. Very same problem on deletion (and always via the API). It seems that problem/bug is related when high concurrency is a factor (like nightly builds). Moreover, we had a script that was deleting (via API) old charts and same problem - we add a sleep between calls and things were better. As v0.8.3 worked well, handle concurrency and never had any signs of this problem, we had to rollback. If you want additional logs and/or data - can plan a maintenance window to have it back on v0.13.1 and collect whatever is needed in order to help isolate the problem. (note: main goal to be on v0.13.1 was a correction to a perf issue that was merged on 0.13.0 (though if we keep helm repos clean - multi-tenant - then perf issue though still there, not visible). pls let us know

Knifa commented 3 years ago

Hey folks. I've been experiencing this same issue since v0.12.0 but thought it was just some weird quirk of my machine. The setup and symptoms are the same:

The only way I've been able to get around this is by enabling --disable-statefiles and then restarting ChartMuseum every time I upload a chart which is obviously not ideal, haha. 😄

What is more puzzling is that my collegue does not experience this issue on v0.12.0, hence why I thought it was something weird on my machine --- not had much time to investigate. It is good to hear it's not just me, though.

rseedorff commented 3 years ago

Hi folks,

we are experiencing the same symptoms randomly :( The only thing that’s helping is to delete the cache file and restart the chartmuseum. Any new ideas how to solve or fix this?

peppe77 commented 3 years ago

@rseedorff we ended up going back to the (fairly old) version we used to be before the upgrade to v0.13.1 as this bug created a lot of other problems for our pipelines. Unfortunately we did not find any acceptable work-around though noticed that bug occurs when there is a high concurrency towards ChartMuseum. (very same scenario/case works fairly well in this old version we had to go back to). Drawback on going back to this old version: there is (performance) bug when helm repo has way too many charts (which got fixed in v0.13.0 or .1 do not recall it).

jabdoa2 commented 3 years ago

Same issue here. Setup is similar (S3 and Redis).

MaxRink commented 2 years ago

We are running into the same Issue, using chartmuseum as a bundeled component in harbor

cbuto commented 2 years ago

I've been working to reproduce this via our locust tests. If anyone has debug logs when this issue as happening, that could be useful 👀

cbuto commented 2 years ago

good news, I believe I found the problem and have some what of an idea on how to solve it (theoretically).

It looks like we need a solution for locking Redis cache entries. For non-redis based caching (using an internal map), we are able to leverage mutexes to handle the locking. But with Redis based caching, we need to make sure cache entries are locked while being operated on.

The implementation needs some thought and plenty of load testing to make sure we maximize throughput while retaining data consistency.