Open pranavgaikwad opened 1 year ago
~This is a good one to start with~ ~The GetDependencies()
is used everytime a condition is evaluated when a dep selector is present.~
Deps cache is now added https://github.com/konveyor/analyzer-lsp/blob/b38354f5b56e6417cdcabb581f292bd5b1a81050/provider/internal/java/service_client.go#L31
Hey @pranavgaikwad,
I am thinking of adding a new field depsCache
in javaServiceClient
struct to store the cache dependencies
type javaServiceClient struct {
// Other fields...
depsCache map[uri.URI][]*provider.Dep
}
We also have to modify GetDependencies()
func (p *javaServiceClient) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
if p.depsCache != nil {
return p.depsCache, nil
}
// Rest of the existing code remains the same...
p.depsCache = m
return m, nil
}
By this every time when GetDependencies()
function is called, It will store the dependencies in depsCache
What are your thoughts on this?? Should i raise a pr by making these changes??
@RipulHandoo I should have updated this, I ended up adding the deps cache already in my PR for dep labels. If you could come up with a generic way of caching condition responses and propose a solution, that'd be awesome!
Hey @pranavgaikwad !!
I am thinking of defining a new struct Cache
that will be like a storage box
type Cache struct{
//As ```map[string]interface{}``` this will store response of any type
data map[string]interface{}
}
Creating a Constructor to initialize and return the cache value
func NewCache() *Cache {
return &Cache{
data: make(map[string]interface{}),
}
}
Methods to Set and Get the Cache
func (c *Cache) Set(key string, value interface{}) {
c.data[key] = value
}
func (c *Cache) Get(key string) (interface{}, bool) {
value, found := c.data[key]
return value, found
}
And just implement in any function
func (p *javaServiceClient) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
// Create a cache instance specific to this function
// Here assume that our cache struct is in package with namely "caching"
cache := caching.NewCache()
// Check if dependencies are already cached
if deps, found := cache.Get("dependencies"); found {
return deps.(map[uri.URI][]*provider.Dep), nil
}
// ... Calculate the dependencies ...
// Once dependencies are obtained, store them in the cache
cache.Set("dependencies", m)
return m, nil
}
What are your thoughts on this approach??
I would love for this to be at a layer between the provider condition evaluate calls, and the provider clients themselves.
I wonder if we can create a handler-like approach, and one of the handlers can be a cache.
+1 to what Shawn said above
@shawn-hurley @pranavgaikwad Yes that would be much better. This make a Cache Handler
type Cache struct {
data map[string]interface{}
}
type CacheHandler struct {
cache *Cache
}
// NewCache creates a new Cache instance
func NewCache() *Cache {
return &Cache{
data: make(map[string]interface{}),
}
}
// NewCacheHandler creates a new CacheHandler instance
func NewCacheHandler() *CacheHandler {
return &CacheHandler{
cache: NewCache(),
}
}
// GetDependencies retrieves dependencies from the cache
func (ch *CacheHandler) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
if deps, found := ch.cache.Get("dependencies"); found {
return deps.(map[uri.URI][]*provider.Dep), nil
}
return nil, errors.New("Dependencies not found in cache")
}
// SetDependencies stores dependencies in the cache
func (ch *CacheHandler) SetDependencies(deps map[uri.URI][]*provider.Dep) {
ch.cache.Set("dependencies", deps)
}
when the GetDependencies()
method will be called, it will first look for the dependencies stored in cache. If not it will calculate them and store them in the cache
// GetDependencies retrieves the dependencies from the provider client
func (p *javaServiceClient) GetDependencies() (map[uri.URI][]*provider.Dep, error) {
// Check if dependencies are already cached
if deps, found := p.cache.GetDependencies(); found {
return deps, nil
}
// ... Calculate the dependencies ...
m := make(map[uri.URI][]*provider.Dep)
// Simulating dependency calculation
m[uri.URI("example.com/dependency1")] = []*provider.Dep{&provider.Dep{}}
// Once dependencies are obtained, store them in the cache
p.cache.SetDependencies(m)
return m, nil
}
@RipulHandoo I think a provider should not worry about setting the cache at all @shawn-hurley right?
I hoped it would be something providers never have to know about.
But if we are making the changes in the provider code that we own that encapsulates the external providers (such as GRPC) I think that would be fine, as it would happen outside the scope of the actual provider implementation.
Can you try to wrap the calls to a Handler in the CacheHandler to cache the results?
basically add to the Cache struct a Handler interface, and when we create a Provider, wrap it in the cache handler.
Does that make sense @RipulHandoo ? If a quick phone call would help we can schedule some time!
I hoped it would be something providers never have to know about.
But if we are making the changes in the provider code that we own that encapsulates the external providers (such as GRPC) I think that would be fine, as it would happen outside the scope of the actual provider implementation.
Can you try to wrap the calls to a Handler in the CacheHandler to cache the results?
basically add to the Cache struct a Handler interface, and when we create a Provider, wrap it in the cache handler.
Does that make sense @RipulHandoo ? If a quick phone call would help we can schedule some time!
@pranavgaikwad @shawn-hurley I'll try and let you know if i could not fix that then a phone call will help a lot. But Upon reviewing the codebase again, it has come to my attention that the javaServiceClient struct already includes a depsCache field of type map[uri.URI][]*provider.Dep. This field serves as a cache to store fetched dependencies for the client.
The code for caching in the GetDependencies function is effectively implemented using thisdepsCache
field. When the GetDependencies function is called, it first checks if the depsCache is not empty. If the cache is populated with previously fetched dependencies, the function directly returns the cached result, avoiding redundant computations.
Given this existing caching mechanism, introducing an additional cache struct or using a handler chain with a cache handler might be redundant and unnecessary in this context. Correct me if I'm wrong..
@RipulHandoo depsCache is an interim solution specific to the Java provider...for the condition caches, we are looking for a generic solution that handles responses from all provider conditions not just dependency ones and also for all providers.
I could be totally off-base here, but I wanted to throw my hat into the ring on this issue. I went ahead and forked the repo here and did some preliminary work on this. I can create a PR so we can discuss on there as well, if you folks want.
I created a struct, ConditionalCache in engine/conditions.go, that simply has two maps, mapping a string to the ConditionResponse and error that can result from a Conditional calling Evaluate. Then, in each of the Evaluate methods I marshal the struct of the Conditional into a string and use that string in the cache, caching the result of the function if it wasn't found. engine/conditions.go:Rule.When already uses YAML.marshal on Conditionals, so it shouldn't be a problem if that was working correctly.
In terms of performance benefits, I went ahead and ran the containerized demo with time sudo podman run -v $(pwd)/demo-output.yaml:/analyzer-lsp/output.yaml:Z test-analyzer-engine
, and these were the results:
Without cache:
real 1m35.181s
user 0m0.214s
sys 0m0.202s
With cache:
real 1m31.411s
user 0m0.219s
sys 0m0.202s
A small performance increase, but this could just be due to system varaiblility more than anything. I could definitely see if conditions are reused this could result in better performance (if my implementation doesn't break anything)
There is a small hitch with this implementation - it requires that all the necessary fields to uniquely identify a Conditional to be public. I had to modify testConditional in engine_test.go in order for them to pass. Unsure if this is a problem or not. ProviderCondition contains the field ConditionInfo interface{}, which also may or may not be an issue. I could also see that we could make each Conditional provide it's own hash function to sidestep any potential issues with YAML, but that may require a lot more effort than it's worth.
Even though this requires each Evaluate method to be modified, it also allows each Conditional to opt in to the caching.
Would love to get your thoughts on this!
@JonahSussman I think rule-example.yaml is not a good representation for performance benefits. The benefits will be more apparent when there are multiple rules, a good portion of them making the exact same queries multiple times, etc.
@JonahSussman, this implementation looks good to me, I would probably handle some way for the condition to set its own cache key of some sort rather than yaml marshal the whole thing.
I agree with @pranavgaikwad it will take a better example, honestly I would be interested in using this to test out some of the bigger test cases that come from the windup shim to see if there is an increase
@RipulHandoo does the code that @JonahSussman sent make sense? do you want to take that commit and make some of the changes?
@shawn-hurley This implementation makes a lot of sense.
I'll be more than happy to commit on this.
And talking about the yaml.Marshal()
method,
I am thinking of caching each condition like this
// Get the sorted conditions
conditions := sortConditionEntries(a.Conditions)
// Generate a cache key for the sorted conditions
cacheKey := generateCacheKey(conditions)
// Check cache
cacheResponse, cacheError, cacheOk := GlobalConditionalCache.Get(cacheKey)
if cacheOk {
return cacheResponse, cacheError
}
// Evaluate conditions
for _, c := range conditions {
// ... existing evaluation logic ...
}
// Set cache at the end
GlobalConditionalCache.Set(cacheKey, fullResponse, nil)
return fullResponse, nil
We can generate the cacheKey using some hash function or some other way. Would love to hear your thoughts on this...
I think that hash of the conditions could work I truly have not thought about it
@shawn-hurley @pranavgaikwad I am thinking of generating the cache by converting the whole condition into slice and then generating the hash for it.
func generateCacheKey(conditions []ConditionEntry) string {
// Use a hash function to generate a unique cache key
hash := sha256.New()
hash.Write([]byte(fmt.Sprintf("%v", conditions)))
return hex.EncodeToString(hash.Sum(nil))
}
What do you think?
when we call the underlying provider, we should have a stringified thing for the actual conditions (it does the replacement) IIRC. I think she sum that string is fine.
I do wonder if the key could just be that string vs a hash
@shawn-hurley @pranavgaikwad
I think if i am not wrong the templateCondition
function
https://github.com/konveyor/analyzer-lsp/blob/main/provider/provider.go#L479
returns byte[]of slice
that we can use as a key as it will be unique.
Other doubt i've is like Evaluate()
function has 3 parameters but when we are calling it from provider.go
https://github.com/konveyor/analyzer-lsp/blob/main/provider/provider.go#L388
we are only passing 2 parameters
Why so?
@RipulHandoo are you still working on it? @pranavgaikwad is this issue still on ?
No @Parthiba-Hazra. You can work on it if you want.
We should introduce a cache for condition responses. To start with, this could be as simple as using a map that stores response for a particular condition. We need to evaluate the added memory footprint.