smol-ai / developer

the first library to let you embed a developer agent in your own app!
https://twitter.com/SmolModels
MIT License
11.79k stars 1.03k forks source link

code2prompt wastes tokens on __pycache__ and other junk files when executing walk_directory #86

Open kcramp858 opened 1 year ago

kcramp858 commented 1 year ago

I dont know how to do a PR, but I saw that it was trying to parse junk files in walk_directory, so I added the following to skip pycache.. maybe other folders are a good idea also.

def walk_directory(directory):
    code_contents = {}
    for root, dirs, files in os.walk(directory):
        if '__pycache__' in dirs:
            dirs.remove('__pycache__')  # don't visit __pycache__ directories
        for file in files:
            if not any(file.endswith(ext) for ext in EXTENSION_TO_SKIP):
                try:
                    relative_filepath = os.path.relpath(os.path.join(root, file), directory)
                    code_contents[relative_filepath] = read_file(os.path.join(root, file))
                except Exception as e:
                    code_contents[relative_filepath] = f"Error reading file {file}: {str(e)}"
    return code_contents