warpdotdev / Warp

Warp is the agentic development environment, built for coding with multiple AI agents.
https://warp.dev
Other
24.03k stars 488 forks source link

Add Support for indexing of large codebases #6586

Open MovGP0 opened 1 month ago

MovGP0 commented 1 month ago

Pre-submit Checks

Describe the solution you'd like?

The "Codebase Index" feature should be able to index large codebases.

Image

Is your feature request related to a problem? Please describe.

Most of my codebases are very large and the agent can't handle it. Having an vector index would be helpful.

Additional context

Running a local encoding model for indexing is likely required.

Operating system (OS)

Windows

How important is this feature to you?

3

Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1

None

dannyneira commented 4 weeks ago

Thanks for this feature request!

To anyone else interested in this feature, please add a 👍 to the original post at the top to signal that you want this feature, and subscribe if you'd like to be notified.

sworley commented 4 weeks ago

I just created a new project for backing up my warp dev envt and the "Codebase Index" states that the Codebase is too large ..

I would be happy if it could be indexed ... however with a better understanding on the functionality, I might be better able to avoid this in future projects.

Operating system (OS) Linux; Ubuntu 24.04

How important is this feature to you? 3

Image

dannyneira commented 3 weeks ago

@sworley @MovGP0 We would appreciate the following information about the larger codebases to better help us understand the use case:

Also note that large files like virtual disk images or ISO's that aren't used for coding may affect the ability to index the codebase as well, so we recommend keeping those seperate.

MovGP0 commented 3 weeks ago

@sworley

I'll give you exact numbers next week. I think the most important part is that build artifacts and packages (which is most of the folder size) need to be ignored, so I'd recommend to respect the .gitignore files in a given Git repository (including ignore files in subfolders).

Further, support for MCP and/or A2A might also mitigate the problem, since users would be able to provide their own indexes.

MovGP0 commented 3 weeks ago

Some statistics about the codebase I am working on currently:

Languages:
- C# (99.6 %)
- PowerShell (0.2 %)
- Visual Basic (0.2 %)
Lines of code: 1,706,418
Number of files: 23,612 files
Size: 1.274 GiB

Structure

Depth: up to 6 directory sublevels
Structure: one subfolder per project (ie. *.csproj file)

File types

Most Common File Types:
•  .cs (C# source files)
•  .resx (Resource files)
•  .png (PNG images)
•  .xml (XML files)

Other Notable File Types:
-  .svg (SVG images)
-  .csproj (C# project files)
-  .bmp (Bitmap images)
-  .ps1 (PowerShell scripts)
-  .config (Configuration files)

Development & Build Files:
-  .nsi/.nsh (NSIS installer files)
-  .settings (Settings files)
-  .licx (License files)

Documents & Media:
-  .txt (Text files)
-  .ico (Icon files)
-  .jpg (JPEG images)
-  .xlsx (Excel files)
-  .docx (Word documents)

Other file types
- various CAD file formats (IGES, STP/STEP, DWG, etc.)

[!Note] The codebase is actually bigger, because it's ditributed about multiple git repositories; referencing other indices for the agent might be required for understanding the full context

[!Note] Build artifacts (/obj and /bin) directories have been excluded for this statistics/size calculations

psalvitti commented 2 weeks ago

I'm in the same boat as @MovGP0 except:

MovGP0 commented 2 weeks ago

The way I'm doing it currently is having an index using the Repomix CLI to create an local index and using Repomix`s MCP API for LLM queries. Unfortunately that is something that Warp does not support yet.

smeeklai commented 2 weeks ago

I don't even have a large codebase (I think). Cursor says it's only 475 files after .gitignore and .cursorignore but Warp cannot index my codebase. Is it actually large or it doesn't respect my .gitignore? Also having an additional ignore file like cursor would be great.

densy07 commented 2 weeks ago

Cursor can index my Unity project, but Warp doesn't.

densy07 commented 1 week ago

I can't use it in my Unity project at present. The total number of files in my Asset folder exceeds 20,000. Even the Turbo plan can't handle this situation. It contains a large number of meta files that should be ignored.

utapyngo commented 1 week ago

What's missing is the ability to ignore certain paths besides .gitignore.

Also the "Codebase index" settings page could display the total number of files it wants to index, not just "too large".