helm / chartmuseum

helm chart repository server
https://chartmuseum.com
Apache License 2.0
3.58k stars 400 forks source link

Failure to update cache with 10k charts when using Azure/Microsoft blob storage. #658

Closed jgreat closed 11 months ago

jgreat commented 1 year ago

We ran into an issue where chartmuseum as part of our harbor segfaults and or reports Panic runtime error: invalid memory address or nil pointer dereference when trying to regenerate chart cache for a large-ish repo.

TL;DR: The index-cache.yaml was too big for a single upload.

I believe we've narrowed it down to a sequence of events.

First error: Panic runtime error: invalid memory address or nil pointer dereference

I first thought this was caused by maybe a full or unavailable redis cache like in issue #558. The redis cache was healthy, only using a 100MB or so of memory and populated for the 3 other smaller repos. and no logs showing "unable to connect to redis" messages.

This was a brand new install importing existing data, but our first troubleshooting step was to delete the redis cache and try and let it rebuild.

I now think the nil pointer dereference was chart museum trying to read a zero byte index-cache.yaml file that was saved in the Azure Blob storage. There didn't appear to be any debug logging showing this so, this is just a guess.

Error 2: Looking for corrupt files.

After fighting with the redis cache for a while we manually altered the harbor helm chart to disable the redis cache. This lead to the same nil pointer dereference error above, so we started looking for issues with the chart files in the blob file storage. We found that there was a index-cache.yaml file that was 0 bytes. Took a chance and deleted the file and chartmuseum finally started to load charts from the blob storage.

Why was index-cache.yaml 0 bytes?

It looks like the azure storage driver needs to be configured to do multi-part uploads when handling files over 4MB. Our index with 10K+ charts in the directory that the generated index.yaml is 6.2MB, so the current upload process isn't working.

https://learn.microsoft.com/en-us/troubleshoot/azure/general/request-body-large

{"L":"WARN","T":"2023-01-12T04:50:49.448Z","M":"Error saving index-cache.yaml","repo":"mobilecoinfoundation-public","error":"storage: service returned error: StatusCode=413, ErrorCode=RequestBodyTooLarge, ErrorMessage=The request body is too large and exceeds the maximum permissible limit.\nRequestId:78b57893-501e-0026-1541-2682a9000000\nTime:2023-01-12T04:50:49.3975797Z, RequestInitiated=Thu, 12 Jan 2023 04:50:49 GMT, RequestId=78b57893-501e-0026-1541-2682a9000000, API Version=2018-03-28, QueryParameterName=, QueryParameterValue="}

{"L":"DEBUG","T":"2023-01-12T04:50:49.448Z","M":"index-cache.yaml saved in storage","repo":"mobilecoinfoundation-public"}
scbizu commented 1 year ago

@jgreat Hi , I am little familiar with this issue , is it #500 ?

cbuto commented 1 year ago

@scbizu I also see https://github.com/chartmuseum/storage/pull/667 which is likely related

scbizu commented 1 year ago

@cbuto Yes, seems like the same issue

cbuto commented 1 year ago

@jgreat the main build contains a fix for this ghcr.io/helm/chartmuseum:canary, do you want to give that a try if you are still running into this issue?

white-eagle-83 commented 1 year ago

Using the canary image tag helped fix the issue and we reverted it back to the tag we used before. Did not really need to keep the chart on canary to have the fix in place.

scbizu commented 11 months ago

Close this issue since the fix is merged into main , feel free to reopen if issue still exists .