Cache condition responses

pranavgaikwad commented 1 year ago

We should introduce a cache for condition responses. To start with, this could be as simple as using a map that stores response for a particular condition. We need to evaluate the added memory footprint.

pranavgaikwad commented 1 year ago

~This is a good one to start with~ ~The GetDependencies() is used everytime a condition is evaluated when a dep selector is present.~

Deps cache is now added https://github.com/konveyor/analyzer-lsp/blob/b38354f5b56e6417cdcabb581f292bd5b1a81050/provider/internal/java/service_client.go#L31

RipulHandoo commented 1 year ago

Hey @pranavgaikwad, I am thinking of adding a new field depsCache in javaServiceClient struct to store the cache dependencies

type javaServiceClient struct {
    // Other fields...
    depsCache map[uri.URI][]*provider.Dep
}

We also have to modify GetDependencies()

func (p *javaServiceClient) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
    if p.depsCache != nil {
        return p.depsCache, nil
    }

    // Rest of the existing code remains the same...

    p.depsCache = m
    return m, nil
}

By this every time when GetDependencies() function is called, It will store the dependencies in depsCache

What are your thoughts on this?? Should i raise a pr by making these changes??

pranavgaikwad commented 1 year ago

@RipulHandoo I should have updated this, I ended up adding the deps cache already in my PR for dep labels. If you could come up with a generic way of caching condition responses and propose a solution, that'd be awesome!

RipulHandoo commented 1 year ago

Hey @pranavgaikwad !! I am thinking of defining a new struct Cache that will be like a storage box

type Cache struct{
        //As ```map[string]interface{}``` this will store response of any type
         data map[string]interface{}
}

Creating a Constructor to initialize and return the cache value

func NewCache() *Cache {
    return &Cache{
        data: make(map[string]interface{}),
    }
}

Methods to Set and Get the Cache

func (c *Cache) Set(key string, value interface{}) {
    c.data[key] = value
}

func (c *Cache) Get(key string) (interface{}, bool) {
    value, found := c.data[key]
    return value, found
}

And just implement in any function

func (p *javaServiceClient) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
    // Create a cache instance specific to this function
   // Here assume that our cache struct is in package with namely "caching"
    cache := caching.NewCache()

    // Check if dependencies are already cached
    if deps, found := cache.Get("dependencies"); found {
        return deps.(map[uri.URI][]*provider.Dep), nil
    }

    // ... Calculate the dependencies ...
    // Once dependencies are obtained, store them in the cache
    cache.Set("dependencies", m)

    return m, nil
}

What are your thoughts on this approach??

shawn-hurley commented 1 year ago

I would love for this to be at a layer between the provider condition evaluate calls, and the provider clients themselves.

I wonder if we can create a handler-like approach, and one of the handlers can be a cache.

pranavgaikwad commented 1 year ago

+1 to what Shawn said above

RipulHandoo commented 1 year ago

@shawn-hurley @pranavgaikwad Yes that would be much better. This make a Cache Handler

type Cache struct {
    data map[string]interface{}
}
type CacheHandler struct {
    cache *Cache
}
// NewCache creates a new Cache instance
func NewCache() *Cache {
    return &Cache{
        data: make(map[string]interface{}),
    }
}
// NewCacheHandler creates a new CacheHandler instance
func NewCacheHandler() *CacheHandler {
    return &CacheHandler{
        cache: NewCache(),
    }
}
// GetDependencies retrieves dependencies from the cache
func (ch *CacheHandler) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
    if deps, found := ch.cache.Get("dependencies"); found {
        return deps.(map[uri.URI][]*provider.Dep), nil
    }
    return nil, errors.New("Dependencies not found in cache")
}

// SetDependencies stores dependencies in the cache
func (ch *CacheHandler) SetDependencies(deps map[uri.URI][]*provider.Dep) {
    ch.cache.Set("dependencies", deps)
}

when the GetDependencies() method will be called, it will first look for the dependencies stored in cache. If not it will calculate them and store them in the cache

// GetDependencies retrieves the dependencies from the provider client
func (p *javaServiceClient) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
    // Check if dependencies are already cached
    if deps, found := p.cache.GetDependencies(); found {
        return deps, nil
    }

    // ... Calculate the dependencies ...
    m := make(map[uri.URI][]*provider.Dep)
    // Simulating dependency calculation
    m[uri.URI("example.com/dependency1")] = []*provider.Dep{&provider.Dep{}}

    // Once dependencies are obtained, store them in the cache
    p.cache.SetDependencies(m)

    return m, nil
}

pranavgaikwad commented 1 year ago

@RipulHandoo I think a provider should not worry about setting the cache at all @shawn-hurley right?

shawn-hurley commented 1 year ago

I hoped it would be something providers never have to know about.

But if we are making the changes in the provider code that we own that encapsulates the external providers (such as GRPC) I think that would be fine, as it would happen outside the scope of the actual provider implementation.

Can you try to wrap the calls to a Handler in the CacheHandler to cache the results?

basically add to the Cache struct a Handler interface, and when we create a Provider, wrap it in the cache handler.

Does that make sense @RipulHandoo ? If a quick phone call would help we can schedule some time!

RipulHandoo commented 1 year ago

I hoped it would be something providers never have to know about.

But if we are making the changes in the provider code that we own that encapsulates the external providers (such as GRPC) I think that would be fine, as it would happen outside the scope of the actual provider implementation.

Can you try to wrap the calls to a Handler in the CacheHandler to cache the results?

basically add to the Cache struct a Handler interface, and when we create a Provider, wrap it in the cache handler.

Does that make sense @RipulHandoo ? If a quick phone call would help we can schedule some time!

@pranavgaikwad @shawn-hurley I'll try and let you know if i could not fix that then a phone call will help a lot. But Upon reviewing the codebase again, it has come to my attention that the javaServiceClient struct already includes a depsCache field of type map[uri.URI][]*provider.Dep. This field serves as a cache to store fetched dependencies for the client.

The code for caching in the GetDependencies function is effectively implemented using thisdepsCachefield. When the GetDependencies function is called, it first checks if the depsCache is not empty. If the cache is populated with previously fetched dependencies, the function directly returns the cached result, avoiding redundant computations.

Given this existing caching mechanism, introducing an additional cache struct or using a handler chain with a cache handler might be redundant and unnecessary in this context. Correct me if I'm wrong..

pranavgaikwad commented 1 year ago

@RipulHandoo depsCache is an interim solution specific to the Java provider...for the condition caches, we are looking for a generic solution that handles responses from all provider conditions not just dependency ones and also for all providers.

JonahSussman commented 1 year ago

I could be totally off-base here, but I wanted to throw my hat into the ring on this issue. I went ahead and forked the repo here and did some preliminary work on this. I can create a PR so we can discuss on there as well, if you folks want.

I created a struct, ConditionalCache in engine/conditions.go, that simply has two maps, mapping a string to the ConditionResponse and error that can result from a Conditional calling Evaluate. Then, in each of the Evaluate methods I marshal the struct of the Conditional into a string and use that string in the cache, caching the result of the function if it wasn't found. engine/conditions.go:Rule.When already uses YAML.marshal on Conditionals, so it shouldn't be a problem if that was working correctly.

In terms of performance benefits, I went ahead and ran the containerized demo with time sudo podman run -v $(pwd)/demo-output.yaml:/analyzer-lsp/output.yaml:Z test-analyzer-engine, and these were the results:

Without cache:
  real    1m35.181s
  user    0m0.214s
  sys     0m0.202s

With cache:
  real    1m31.411s
  user    0m0.219s
  sys     0m0.202s

A small performance increase, but this could just be due to system varaiblility more than anything. I could definitely see if conditions are reused this could result in better performance (if my implementation doesn't break anything)

There is a small hitch with this implementation - it requires that all the necessary fields to uniquely identify a Conditional to be public. I had to modify testConditional in engine_test.go in order for them to pass. Unsure if this is a problem or not. ProviderCondition contains the field ConditionInfo interface{}, which also may or may not be an issue. I could also see that we could make each Conditional provide it's own hash function to sidestep any potential issues with YAML, but that may require a lot more effort than it's worth.

Even though this requires each Evaluate method to be modified, it also allows each Conditional to opt in to the caching.

Would love to get your thoughts on this!

pranavgaikwad commented 1 year ago

@JonahSussman I think rule-example.yaml is not a good representation for performance benefits. The benefits will be more apparent when there are multiple rules, a good portion of them making the exact same queries multiple times, etc.

shawn-hurley commented 1 year ago

@JonahSussman, this implementation looks good to me, I would probably handle some way for the condition to set its own cache key of some sort rather than yaml marshal the whole thing.

I agree with @pranavgaikwad it will take a better example, honestly I would be interested in using this to test out some of the bigger test cases that come from the windup shim to see if there is an increase

shawn-hurley commented 1 year ago

@RipulHandoo does the code that @JonahSussman sent make sense? do you want to take that commit and make some of the changes?

RipulHandoo commented 1 year ago

@shawn-hurley This implementation makes a lot of sense. I'll be more than happy to commit on this. And talking about the yaml.Marshal() method, I am thinking of caching each condition like this

// Get the sorted conditions
conditions := sortConditionEntries(a.Conditions)

// Generate a cache key for the sorted conditions
cacheKey := generateCacheKey(conditions)

// Check cache
cacheResponse, cacheError, cacheOk := GlobalConditionalCache.Get(cacheKey)
if cacheOk {
    return cacheResponse, cacheError
}

// Evaluate conditions
for _, c := range conditions {
   // ... existing evaluation logic ...
}

// Set cache at the end
GlobalConditionalCache.Set(cacheKey, fullResponse, nil)

return fullResponse, nil

We can generate the cacheKey using some hash function or some other way. Would love to hear your thoughts on this...

shawn-hurley commented 1 year ago

I think that hash of the conditions could work I truly have not thought about it

RipulHandoo commented 1 year ago

@shawn-hurley @pranavgaikwad I am thinking of generating the cache by converting the whole condition into slice and then generating the hash for it.

func generateCacheKey(conditions []ConditionEntry) string {
   // Use a hash function to generate a unique cache key
   hash := sha256.New()
   hash.Write([]byte(fmt.Sprintf("%v", conditions)))
   return hex.EncodeToString(hash.Sum(nil))
}

What do you think?

shawn-hurley commented 1 year ago

when we call the underlying provider, we should have a stringified thing for the actual conditions (it does the replacement) IIRC. I think she sum that string is fine.

I do wonder if the key could just be that string vs a hash

shawn-hurley commented 1 year ago

https://github.com/konveyor/analyzer-lsp/blob/main/provider/provider.go#L388

the templatedInfo

RipulHandoo commented 1 year ago

@shawn-hurley @pranavgaikwad I think if i am not wrong the templateCondition function https://github.com/konveyor/analyzer-lsp/blob/main/provider/provider.go#L479 returns byte[]of slice that we can use as a key as it will be unique. Other doubt i've is like Evaluate() function has 3 parameters but when we are calling it from provider.go https://github.com/konveyor/analyzer-lsp/blob/main/provider/provider.go#L388 we are only passing 2 parameters Why so?

Parthiba-Hazra commented 10 months ago

@RipulHandoo are you still working on it? @pranavgaikwad is this issue still on ?

RipulHandoo commented 10 months ago

No @Parthiba-Hazra. You can work on it if you want.

konveyor / analyzer-lsp

Cache condition responses #260