Open Trenly opened 1 month ago
This seems to cover the pipelines, not the CLI application - should it be moved to winget-pkgs?
I'll leave that up to @denelon, but considering that the index creation is part of the CLI implementation, I had opted to put it here, mostly for planning purposes within the team; Especially since the rebuild pipeline isn't typically run as a regular part of verification/publishing
I'll let the engineering team take a look to see if this is beneficial, and if it should be here or at winget-pkgs. 😊
Description of the new feature / enhancement
When rebuilding the entire index, it takes a long time as each manifest must be fully parsed and rebuilt. However, many of these manifests may not have changed since the last time a rebuild was run. With nearly 60,000 manifests, it would be beneficial to have some method of doing a partial rebuild.
Proposed technical implementation details
When a rebuild is performed, a copy of the manifests and the indexes could be saved off to a storage blob as a gzip. When the next rebuild is performed, this gzip could be downloaded and expanded, and the indexes loaded into memory as if it were the publishing pipeline. Then, instead of rebuilding the index from scratch, each manifest could be compared. If the manifest has changed, then update the index based upon the diff from the old manifest file to the new manifest file. If there was no change in the manifest, the index does not need to be updated. Once all the manifests have been processed, the new indexes can be published and a copy of the manifests and indexes can be saved off as the cache for the next rebuild.
Of course the pipelines will still need to have an option to perform a full rebuild, if necessary, but adding a caching layer could significantly reduce the amount of time it takes by starting from the last known-good index.
With this caching strategy, it could also be beneficial to perform a rebuild on a regular cadence (every 3 months?) to help ensure a well-maintained cache.