lpelabs / reDocs.ai

Full codebase to developer standard documentation harnessing AI!
MIT License
6 stars 3 forks source link

reDocs.ai

Introduction

The lack of an automated solution for converting codebases into documentation poses challenges in terms of time, accuracy, and code comprehension. Documentation is often ignored by developers, especially in fast-building teams. However, this leads to severe technical debt. Since technical documentation is hard and existing tools are limited or expensive, there is a need for comprehensive automatic documentation generation.

Description

Full Codebase to Developer Docs in One Step

Our prototype offers a seamless solution to transform a full codebase into comprehensive developer documentation in just one step. By uploading a zip file containing the codebase, you can let the magic happen. The resulting documentation includes function explanations, API specs, table schemas, and dependencies, all in Markdown format.

Harnessing GPT-3.5 Capabilities

To power our documentation generation, we leverage the capabilities of GPT-3.5. This advanced language model enables us to produce accurate and contextually relevant documentation for the given codebase.

A Step-by-Step Approach

  1. Codebase Traversal: The process begins by traversing the codebase in a tree-wise fashion to access its contents.

  2. Code Embeddings with CodeBERT: To extract meaningful information from the code, we employ Microsoft's CodeBERT for code embeddings. However, we encountered an issue with large code files that CodeBERT cannot handle effectively.

  3. Handling Large Code Files: To overcome the limitations of CodeBERT for large code files, we devised our own algorithm to create tokenizers in a window-like manner. By specifying a window size and an overlap "region," we maintain essential context and generate embeddings by averaging the embeddings produced for each window.

  4. Maintaining Context with Agglomerating Clustering: To ensure context preservation across the codebase, we use Agglomerative Clustering. This technique groups "similar" code files with shared semantic meanings and features, enhancing the quality of the generated documentation. We choose this type of clustering to exploit the hierarchical relations in the clusters formed.

  5. Efficient Documentation Generation: After clustering, we concatenate the code files belonging to the same cluster. The resulting concatenated code is then sent to GPT-3.5 using efficient prompt engineering techniques. The generated documentation provides comprehensive insights into the codebase.

Code Refactoring Ability

We harness the power of the LLM to perform code refactoring, with our complex prompt to change the given code block to a neater, efficient and structurally sound code output. We focus on the cleanliness in the prompting along with considering various code analytics to get the best output.

Adding Tests to Code

We also provide a solution to add testing for a specific code block. This forms an integral component in the developer experience, and eliminates the need to devote much time to think about the testing. We again leverage well thought of prompts to give optimal and exhaustive tests.

Tech Stack

List of technologies used to build the prototype:

Setup Guide

  1. Check out the Server Setup guide here.

  2. Client Frontend Setup:

    • Install Node.js dependencies:

      npm install
    • Run the development server for the frontend:

      npm run dev

Example Documentations Generated by reDocs.ai -

1) Documentation of ComicifyAI:

  1. Documentation_1
  2. Documentation_2
  3. Documentation_3
  4. Documentation_4
  5. Documentation_5
  6. Documentation_6

2) Documentation of Cluboard:

  1. Documentation_1
  2. Documentation_2

    Architecture Diagram

redocs_arch

Start Contributing

MIT License