open-sauced / pizza

This is an engine that sources git commits and turns them to insights
Apache License 2.0
32 stars 13 forks source link

feat: implements the never evict repository policy in the LRU cache given a YAML configuration file #28

Closed k1nho closed 1 year ago

k1nho commented 1 year ago

Description

This PR adds a flag config that will parse a string path during server startup. It defaults to an empty string, if no config flag is provided

What type of PR is this? (check all applicable)

Related Tickets & Documents

26 Flag to consume yaml file at path to prevent eviction from cache of configured repos

Mobile & Desktop Screenshots/Recordings

Added tests?

Added to documentation?

[optional] Are there any post-deployment tasks we need to perform?

[optional] What gif best describes this PR or how it makes you feel?

senkuflag

jpmcb commented 1 year ago

@k1nho - are you interested in tackling the integration of the "never evict" map into the LRU cache itself? We shouldn't merge code that is half-baked (pun intended) and I'd like to avoid the main branch having commits that are not feature complete. Although, this isn't introducing breaking changes, it would just be confusing for anyone looking at the main branch commits.

k1nho commented 1 year ago

@jpmcb sure, I can give it a try! I was waiting based on the scope of the issue, but if it is ok to do it in this one I can try. I would like to know more about certain cases, right now I think of two in particular, would like to get more info on this.

jpmcb commented 1 year ago

I think both those cases are misconfigurations: ideally someone shouldn't configure the cache to be smaller than their list of "never evict" repos nor should they have

We should provide feedback that something's not right. Let's start with a warning message and go from there.

k1nho commented 1 year ago

image

Ok, trying to understand better the implementation. This case, for example, where we have repo3 on the LRU spot, but since repo3 is on neverEvictRepos map, we cannot evict it instead we should evict nearest LRU which happens to be 'a'. After, the swap we have a full cache with all repos being neverEvictRepos, so any subsequent Put() call can never evict a repo? or is the setup of neverEvictRepos like a constant, meaning we do not put them on the DLL but they are just placed in hm?

jpmcb commented 1 year ago

or is the setup of neverEvictRepos like a constant, meaning we do not put them on the DLL but they are just placed in hm?

Right. neverEvictRepos is more of a receipt to "check" when eviction is going to take place. neverEvictRepos shouldn't be a "pre-load" of repos since that would likely trigger massive rate limiting from git service providers.

ince repo3 is on neverEvictRepos map, we cannot evict it instead we should evict nearest LRU which happens to be 'a'

Correct.

After, the swap we have a full cache with all repos being neverEvictRepos, so any subsequent Put() call can never evict a repo?

That's correct. Typically that would be a cache "drop". But again, I would say that's a misconfiguration on the part of the user deploying this service. There should always be some amount of disk space free and the size of the repos in neverEvictRepos shouldn't be equal to (or greater than) the size of the attached disk.

In that scenario, we should log out a warning message that can be surfaced by administrators. And in the future, we may add some metrics that denote cache "drops" so they can see a problem has started with the service.

But for now, start with something basic. Something like a check in the eviction algorithm that sees if the node up for eviction is in the hashmap. If it is, skip and go to the next node in the list.

k1nho commented 1 year ago

Awesome, I understand better now! I did the implementation, but need some help with the test setup. Right now, we have a test TestTryEvict and this line will evict all the repos.

https://github.com/open-sauced/pizza/blob/e99d988bbc30d3fc9fa4e4a6cbb0d789a5298d76/pkg/cache/lrucache_test.go#L194C8-L194C8

The way I have it now is that when nil is reached on the DLL, it means that all the repositories in the cache are neverEvictRepos and thus we error because of the misconfiguration. I had a test for tryEvict, but it will fail because of trying to evict everything (which it should since it represents a misconfiguration).