sourcebot-dev / sourcebot

Blazingly fast code search 🏎️ Deployed as a single Docker image 📦 Search million+ lines of code in your GitHub, GitLab, and Gitea repositories 🪄 MIT licensed ✅
https://sourcebot.dev
MIT License
1.16k stars 37 forks source link

Rate limit issue while specifying larger repo #64

Open Vinoth414 opened 2 weeks ago

Vinoth414 commented 2 weeks ago

While we are indexing a larger GitHub repo with more than 800 repo where https://github.com/sourcebot-dev/sourcebot/blob/main/packages/backend/src/github.ts#L123 in this point we are micro matching the branch name by micro matching the repos .while getting the branches we are facing secondary rate limit issue. so it will be better to add a condition to check the user provided micro matchable branch names.

if you are ok with this changes. I am ready to contribute it and also i like to contribute for other changes too

brendan-kellam commented 2 weeks ago

Heya - sorry I don't quite understand what the issue is. Are you facing rate limits from the GitHub api? Could you also provide logs if they are relevant?

Vinoth414 commented 1 week ago

let us consider a organization has more that 1400 repos and i had given 2 branch to index in the config file. then the process call function to get all the branches in the repo and micro match it after few API call to get the branches end point it starts to throw rate limit issue and the specified branches are not indexed properly.

brendan-kellam commented 1 week ago

Ah I see what your are saying - to confirm, have you specified a token in your config file? The GitHub docs specify that the rate limit for API requests is 5,000 per hour when a token is provided, but only 60 per hour when there is not token.

brendan-kellam commented 1 week ago

Regardless, I think the longer-term thing here is to source the set of branches & tags from the checked-out git repository (since the information will already be there). That way, we don't need to hit the list branches or list tags endpoints.

Vinoth414 commented 1 week ago

hi , i had find a new way to check weather the branch is present or not .shall i share it or commit that changes

brendan-kellam commented 1 week ago

Sure - could you share your approach here?

Vinoth414 commented 1 week ago

instead of checking branches here https://github.com/sourcebot-dev/sourcebot/blob/main/packages/backend/src/github.ts#L123 we will user the cloned repo and run the following command git ls-remote -heads orgin and branches here in the repo path it will return the present branch and it also has potential to match wild cards like release/*