kiegroup / github-action-build-chain

85 stars 24 forks source link

investigate number of github API calls done by the tool in order to try to reduce them #402

Closed Ginxo closed 1 year ago

Ginxo commented 1 year ago

I've been checking number of github API calls and this is the result

[DEBUG] Github API Service Calls. Number of calls: 24
[DEBUG] Github API Service Calls. this.octokit.pulls.get
[DEBUG] Github API Service Calls. this.octokit.repos.get(droolsjbpm-build-bootstrap)
[DEBUG] Github API Service Calls. this.octokit.repos.get(kie-soup)
[DEBUG] Github API Service Calls. this.octokit.repos.get(droolsjbpm-knowledge)
[DEBUG] Github API Service Calls. this.octokit.repos.get(drools)
[DEBUG] Github API Service Calls. this.octokit.repos.get(optaplanner)
[DEBUG] Github API Service Calls. this.octokit.repos.get(jbpm)
[DEBUG] Github API Service Calls. this.octokit.repos.get(droolsjbpm-integration)
[DEBUG] Github API Service Calls. this.octokit.repos.get(process-migration-service)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/droolsjbpm-integration)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/optaplanner)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/drools)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/droolsjbpm-knowledge)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/droolsjbpm-build-bootstrap)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/kie-soup)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/jbpm)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/process-migration-service)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/optaplanner)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/droolsjbpm-integration)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/droolsjbpm-knowledge)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/droolsjbpm-build-bootstrap)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/drools)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/kie-soup)
[DEBUG] Github API Service Calls. this.octokit.pulls.list(kiegroup/jbpm)

some of them are repeated, like this.octokit.pulls.list but they are required in order to check whether the are PR either from source or from target... So I don't think they can be reduced... @shubhbapna @radtriste

radtriste commented 1 year ago

so IIUC, the problem is the parallel run of checks right ? I mean, from Drools8 repository, basically for kie-ci4 user, we would have 6 different parallel checks and each of them would make those calls (list would be different) If now we setup a new environment as default PR checks, that means 12 different parallel checks ...

One solution, instead of having parallel checks would be to do a fdb instead, which means only one check per environment. But the issue is that the build time would increase a lot...

btw, how is done the checkout ? @Ginxo @shubhbapna

Ginxo commented 1 year ago

checkout is done in parallel by default, but there is a chance to sequentially do it just by adding skipParallelCheckout flag either from CLI --skipParallelCheckout see CLI inputs or GHA workflow input

radtriste commented 1 year ago

my point is more:

shubhbapna commented 1 year ago

@radtriste all these github api calls are used to gather the checkout information. Once we have that, we simply use git to clone, merge etc. We don't use the token to clone

radtriste commented 1 year ago

ok but those still are GH calls made with the token or ?

Ginxo commented 1 year ago

ok but those still are GH calls made with the token or ?

all GH APIs calls are done through the same octokit instance where GH token is added in case it is present, so yes... they are

radtriste commented 1 year ago

so that makes more GH calls than the above ones ;)

shubhbapna commented 1 year ago

@Ginxo I think Tristan is asking about the actual cloning of the repository. We don't use octokit for that we use the actual command line git. I am not sure if the git cli internally makes GH call (it probably doesn't) but either way that is again something we cannot avoid

shubhbapna commented 1 year ago

I was thinking of a different approach to resolve this (I had briefly mentioned this to Tristan before):

Given most of our projects are public, I think we can use the github api without providing a token since the info we usually try to get is anyways public. (I tested a sample github api call to get list of pull requests without a token and it worked)

I was thinking of implementing a retry method for all our GH calls. We make the api call without a token. If it works great. If it doesn't and the error is 401 or 403 then we use the token and make the call again.

For our case all our GH calls should work without falling back to using token. This will greatly reduce the number of GH calls we make without having us to worry about the number of parallel checks running (at least for now). Wdyt? @Ginxo @radtriste

Ginxo commented 1 year ago

I was thinking of a different approach to resolve this (I had briefly mentioned this to Tristan before):

Given most of our projects are public, I think we can use the github api without providing a token since the info we usually try to get is anyways public. (I tested a sample github api call to get list of pull requests without a token and it worked)

I was thinking of implementing a retry method for all our GH calls. We make the api call without a token. If it works great. If it doesn't and the error is 401 or 403 then we use the token and make the call again.

For our case all our GH calls should work without falling back to using token. This will greatly reduce the number of GH calls we make without having us to worry about the number of parallel checks running (at least for now). Wdyt? @Ginxo @radtriste

sorry @shubhbapna but you are not considering the cases where repositories are private right? like https://github.com/kiegroup/droolsjbpm-build-bootstrap/blob/main/.ci/product-projects-dependencies.yaml Am I wrong?

radtriste commented 1 year ago

letting default to no token is dangerous as GH will based the rate limit on IP address, which may be the same for a lot of tools running on PSI ... I would keep anyway the token Could we use instead a token pool ? I think that would worth it. For example, for now, we have:

With the pool, the build-chain would have multiple possibilities of tokens and try them, one after the other it there is a rate limit on one. wdyt ? we could even add more tokens if needed.

Ginxo commented 1 year ago

letting default to no token is dangerous as GH will based the rate limit on IP address, which may be the same for a lot of tools running on PSI ... I would keep anyway the token Could we use instead a token pool ? I think that would worth it. For example, for now, we have:

* Drools8 using kie-ci4

* Kogito using kie-ci3

* OP8 using kie-ci5

With the pool, the build-chain would have multiple possibilities of tokens and try them, one after the other it there is a rate limit on one. wdyt ? we could even add more tokens if needed.

it makes sense... what about changing https://github.com/kiegroup/github-action-build-chain/blob/2a020aba5ed5ebd15199447e728318f801a39497/src/domain/inputs.ts#L39 to token?: string[], change OctokitService to multion pattern (one per token) and getToken will return back one of them. We would have to document it in the way all token should have the same rights. Additionally to it we can wrap any octokit call by a try catch block in order to retry up to X times (3?) wdyt @shubhbapna ?

shubhbapna commented 1 year ago

sorry @shubhbapna but you are not considering the cases where repositories are private right? like https://github.com/kiegroup/droolsjbpm-build-bootstrap/blob/main/.ci/product-projects-dependencies.yaml Am I wrong?

Hence the retry right? Either way we can pass an option to disable retry and use token directly for such cases

letting default to no token is dangerous as GH will based the rate limit on IP address, which may be the same for a lot of tools running on PSI

That's true...

With the pool, the build-chain would have multiple possibilities of tokens and try them, one after the other it there is a rate limit on one. wdyt ?

So if we had this before:

Drools8 using kie-ci4 Kogito using kie-ci3 OP8 using kie-ci5

After implementing the token pool will it be something like this:

Drools8 using kie-ci3, kie-ci4, kie-ci5 Kogito using kie-ci3, kie-ci4, kie-ci5 OP8 using kie-ci3, kie-ci4, kie-ci5

Or

Drools8 using kie-ci3, kie-ci4, kie-ci5 Kogito using kie-ci6, kie-ci7, kie-ci8 OP8 using kie-ci9, kie-ci10, kie-ci11

The first case will still result in rate limit issues IMO

radtriste commented 1 year ago

As they will share the pool, it may be that sometimes OP needs more or Drools needs more (or Kogito) We can start with solution 1, sharing same tokens. and if not enough, we can still expand the number of tokens and redistribute.