sweepai / sweep

Sweep: open-source AI-powered Software Developer for small features and bug fixes.
https://sweep.dev
Other
7.43k stars 426 forks source link

Sweep: File age as code search rank factor #690

Closed wwzeng1 closed 1 year ago

wwzeng1 commented 1 year ago

Description

The older a file is, the more likely it is that the file plays a key role in the repo

Relevant files

vector_db.py scorer.py

sweep-nightly[bot] commented 1 year ago

Here's the PR! https://github.com/sweepai/sweep/pull/692.

💎 Sweep Pro: I used GPT-4 to create this ticket. You have 52 GPT-4 tickets left.


Step 1: 🔍 Code Search

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/sweepai/sweep/blob/fd4f5516b81f7eb029a41c16d3a08219675e2613/sweepai/core/vector_db.py#L1-L374 https://github.com/sweepai/sweep/blob/fd4f5516b81f7eb029a41c16d3a08219675e2613/sweepai/utils/scorer.py#L1-L22 https://github.com/sweepai/sweep/blob/fd4f5516b81f7eb029a41c16d3a08219675e2613/docs/pages/blogs/building-code-search.mdx#L1-L103 https://github.com/sweepai/sweep/blob/fd4f5516b81f7eb029a41c16d3a08219675e2613/sweepai/handlers/on_comment.py#L210-L280 https://github.com/sweepai/sweep/blob/fd4f5516b81f7eb029a41c16d3a08219675e2613/sweepai/utils/github_utils.py#L155-L293

Step 2: 🧐 Snippet Analysis

From looking through the relevant snippets, I decided to make the following modifications:

File Path Proposed Changes
sweepai/utils/scorer.py Modify the compute_score function to include a new factor for file age. The age factor can be calculated as age_factor = 1 / (file_age_in_days + 1). Multiply the final score with this age factor.
sweepai/utils/github_utils.py Add a new function get_file_age that takes a repo and a file path as input and returns the age of the file in days. This can be done by getting the list of commits for the file and finding the date of the first commit.
sweepai/core/vector_db.py Modify the get_deeplake_vs_from_repo function to calculate the age of each file using the get_file_age function and pass it to the compute_score function.

Step 3: 📝 Planning

I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:

Add file age as code search rank factor sweep/file-age-factor

Description

This PR adds a new factor for file age to the code search rank calculation. The older a file is, the more likely it is that the file plays a key role in the repo.

Changes Made

  • Modified the compute_score function in scorer.py to include a new factor for file age. The age factor is calculated as age_factor = 1 / (file_age_in_days + 1). The final score is multiplied by this age factor.

  • Added a new function get_file_age in github_utils.py to calculate the age of a file. This function takes a repo and a file path as input and returns the age of the file in days. It does this by finding the date of the first commit of the file.

  • Modified the get_deeplake_vs_from_repo function in vector_db.py to calculate the age of each file using the get_file_age function. The age is then passed to the compute_score function.

Testing

  • Added unit tests for the new get_file_age function in github_utils.py to ensure it correctly calculates the age of a file.

  • Added unit tests for the modified compute_score function in scorer.py to verify that the age factor is correctly applied to the score calculation.

Related Issue

Closes #690


Step 4: ⌨️ Coding

I have finished coding the issue. I am now reviewing it for completeness.


Step 5: 🔁 Code Review

Success! 🚀


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind! Join Our Discord