j3t / jenkins-pipeline-cache-plugin

A cloud native file cache for Jenkins build pipelines which uses an S3-Bucket as storage provider.
MIT License
15 stars 2 forks source link

Join forces with "jobcacher" plugin? #10

Open darxriggs opened 2 years ago

darxriggs commented 2 years ago

There is another caching plugin at https://github.com/jenkinsci/jobcacher-plugin which was created some years ago and got a new maintainer. It currently performs caching differently than this plugin here but for example also supports Freestyle jobs and storage on the Jenkins controller.

Maybe both plugins and functionality could be merged.

What do you think @j3t and @repolevedavaj?

darxriggs commented 2 years ago

I think the features in jenkins-pipeline-cache-plugin are more flexible than the ones in jobcacher-plugin:

But jobcacher-plugin already has extension points that could be used to implement different caches. The question here is, if a user should be presented with so many options to choose from.

repolevedavaj commented 2 years ago

@darxriggs thanks for initiating this discussion. I am definitely interested into this. @j3t is there a reason why you created your own plugin instead of using the jobcacher plugin?

j3t commented 2 years ago

If I remember correctly, the jobcacher plugin was not maintained actively and it was not solving our problems (no MinIO support, no fallback cache, no support for maven multi Module projects). It is not intuitive, complicated and less flexible.

repolevedavaj commented 2 years ago

I see your points. We had some similar issues, that's the reason why I adopted the plugin.

Are you interested in merging the plugins and join the Jenkins community? If yes, I can try to make a proposal how the merged plugin could look like.

j3t commented 2 years ago

In general I think it's a good idea, but I'm also a little bit sceptical how this could look like. Lets give it try. Maybe we should sharpen the expectations/requirements first.

How would the following example look like with the jobcacher plugin after the migration?

node {
    git (url: 'https://github.com/spring-projects/spring-petclinic', branch: 'main')
    cache (path: "$HOME/.m2/repository", key: "petclinic-${hashFiles('**/pom.xml')}", restoreKeys: ['petclinic-', 'maven-']) {
        sh './mvnw package'
    }
}

I would expect only one of the following outcomes:

The test coverage should be at least on the same level as it is now.

repolevedavaj commented 2 years ago

I could imagine that we introduce your way of handling caching first as an alternative to the plugin users without changing the actual behavior of your cache step implementation. I think we should refactor your step so it uses the same item storage abstraction, as not all users have access to MinIO or AWS.

On the long run, I think having an behavior and API similar to the Github Action is better as a lot of users are already familiar with this. As Jenkins plugins are normally backward-compatible, we need to think about that. Maybe we can keep the old step API but redirect internally to the new implementation.

These are just some thoughts, so there is probably more. If you want, we can have chat about this and discuss this in a little more detail. What do you think (I am located in Switzerland btw.)?

j3t commented 2 years ago

I think we should refactor your step so it uses the same item storage abstraction, as not all users have access to MinIO or AWS.

Agree but the item storage is very generic and strict. As far as I know the archive must be created first and can be uploaded to S3 afterwards. The jenkins-pipeline-cache-plugin creates and uploads the archive in one single step (see BackupCallable.java). Also, when a cache gets stored or restored then the last access timestamp gets updated which is important when it comes to the eviction (see RestoreCallable.java).

As Jenkins plugins are normally backward-compatible, we need to think about that. Maybe we can keep the old step API but redirect internally to the new implementation.

I guess we could just extend the existing CacheStep.java so that it supports both Step APIs.

darxriggs commented 2 years ago

From the usage perspective the new cache implementation could look like this:

node {
    git (url: 'https://github.com/spring-projects/spring-petclinic', branch: 'main')
    cache (maxCacheSize: 250, caches: [[$class: 'FooCache', path: "$HOME/.m2/repository", key: "petclinic-${hashFiles('**/pom.xml')}", restoreKeys: ['petclinic-', 'maven-']]]) {
        sh './mvnw package'
    }
}

So the already existing ArbitraryFileCache could be kept for backward compatibility.

It has to be decided, how the defaultBranch option should be treated. It is optional, so not a very big problem for backward compatibility. But it might better be converted to an ArbitraryFileCache option and not an overall option. If kept as is, in case only FooCache is used as cache, a validation could be implemented to tell the user that the defaultBranch option is not supported for this cache. This is just a rough sketch. The final API should be further improved.

With the possibility to have different caches, #9 (specific maven cache) could also be implemented.

j3t commented 2 years ago

When the assumption is that an API similar to the Github Action is the way to go anyway, then I would argue that maxCacheSize and caches should be optional as well. The plugin could then decide, depending on the provided configuration, which implementaion is used and it could also throw an error if the configuration is not valid. This way, we are still backward compatible and we also have a clean and user friendly API.

repolevedavaj commented 2 years ago

When the assumption is that an API similar to the Github Action is the way to go anyway, then I would argue that maxCacheSize and caches should be optional as well. The plugin could then decide, depending on the provided configuration, which implementaion is used and it could also throw an error if the configuration is not valid. This way, we are still backward compatible and we also have a clean and user friendly API.

Yes, I prefer this way too

darxriggs commented 2 years ago

I'd like to discuss the maximum cache size. We should decide if it should be configurable on the global Jenkins scope or per job.

It should take into consideration that Jenkins is used in small setups as well as big setups where individual persons might not have the permissions to configure all options.

I think there are two roles involved. An administrator that provides the cache and a developer that uses the cache. There is only a single cache storage available per Jenkins instance which is used for every job. The administrator has to provide the storage space (which might be "infinite" for certain cloud providers).

Therefore the administrator should be in control of the maximum available cache storage which means it has to be configured on the Jenkins scope. If it would be configurable per job, the administrator had no control of how much space would be used for the complete Jenkins instance and it could run out of space.

What's you opinion on this?

darxriggs commented 2 years ago

Another question: From the discussion so far my impression is that we want to go for the jobcacher plugin and not the jenkins-pipeline-cache plugin and migrate some functionality / API over there. For me this looks like the way to go because this plugin is already released, has some users and supports more job types. The downside is that we have to deal with the current API.

Is this what you have in mind too?

repolevedavaj commented 2 years ago

Regarding the cache size: I agree that an administrator has some interest to control the space caching requires, but I think there is a reason to configure it on a job level too, because if you do not restrict the size and you are increasing the cache size every build, the time needed to cache/restore will increase too.

And regarding the repo to use in the future, I would go with the jobcacher too.

j3t commented 2 years ago

I'd like to discuss the maximum cache size. We should decide if it should be configurable on the global Jenkins scope or per job.

We have to introduce a global configuration parameter for the jenkins-pipeline-cache anyway, so I guess this parameter could also be used by the existing job types to implement a global limit (which is defined by the admin). The limit on the job level, must also be respected (backward-compatibility) but it could be optional.

From the discussion so far my impression is that we want to go for the jobcacher plugin and not the jenkins-pipeline-cache plugin and migrate some functionality / API over there. For me this looks like the way to go because this plugin is already released, has some users and supports more job types. The downside is that we have to deal with the current API.

In my opinion we have to be backward-compatible so that both user groups are happy, which is the case when we make maxCacheSize and caches optional and let the plugin decide. What is you opinion about my my proposal (see https://github.com/j3t/jenkins-pipeline-cache-plugin/issues/10#issuecomment-1171609106)?

j3t commented 2 years ago

And regarding the repo to use in the future, I would go with the jobcacher too.

Fully agree

repolevedavaj commented 2 years ago

@j3t Yes, we can try go this way, but we have to see whether this can be done in a clean way. If not, we could introduce a completely new step/wrapper (e.g. arbitraryFileCache(...)) and reuse some parts of this in the old implementation. This would allow us to have a clear cut between the old and the new world and I think if we would like to introduce the Maven cache, we would like to use something like mavenCache(...) anyway.