mattlgroff / repo-inspector

AI Plugin to remotely Git Clone and inspect a Git repository
3 stars 2 forks source link

List files recursively #1

Open Regenhardt opened 1 year ago

Regenhardt commented 1 year ago

Currently, getting a repo's or folder's overview lists the files found there, but not the folders or files in those folders. This lets ChatGPT think, these nested folders or files don't exist.

I'd like ChatGPT to get a full, recursive list of all files in the repository. Or if that might be too many for big repositories, I guess you could add a gate that, from a certain size, only returns the folders on the level currently queried.

All this should then probably be in the system prompt to tell ChatGPT what's going on.

Is inspect_folder the method handling this? I'd gladly implement this.

mattlgroff commented 1 year ago

I don't think this would be too difficult to implement. My suggestion would be to make it an argument to get recursive results, telling ChatGPT that it is by default not recursive. So if the user asks for a recursive search it will try it's best.

The free tier I'm hosting this on already runs out of memory constantly so I wouldn't want to further exacerbate the issue with recursive on by default.

Regenhardt commented 1 year ago

Doesn't Repo.clone_from(repo_url, temp_path) clone the whole repository including all folders already? Thought this is jsut a matter of removing the if os.path.isfile(os.path.join(full_path, f)) or somehow making os.listdir(full_path) work recursively. I haven't done much python though, I'm mostly a C# developer.

Regenhardt commented 1 year ago

Oh wait, memory not storage, I always confuse these. I guess if that runs out it should indeed not list the whole structure at once.

mattlgroff commented 1 year ago

It does clone the whole repo.

Sorry I conflated these a bit.

The entire repo gets cloned, and gets held in memory. This is fine for a single call, except the machine running this is super low RAM so when it gets hit by more than a few at once it doesn't scale.

The limitation on providing an entire recursive file list is going to be on the token limit rather than a memory issue. For example, if you list every file recursively in the Ruby repo, it will definitely run into a token limit since we're talking thousands of filenames/metadata.

inspect_folder could be done recursively, optionally.

Also side effect, I'm a JS/TS dev and python isn't my first choice either.

You have full permission to take this repo and run the plugin locally and go nuts, it's MIT license so you don't even need to ask :)