mouimet-infinisoft / ibrain-cli

0 stars 0 forks source link

Codebase In Depth Analysis on 3 perspectves using AST & Metadata #13

Open mouimet-infinisoft opened 1 month ago

mouimet-infinisoft commented 1 month ago

Specification for Enriching Code Analysis and Representation

Objective: Enhance the analysis and representation of code by including comprehensive metadata and detailed context from the Abstract Syntax Tree (AST). This will improve the ability to query and understand the codebase effectively.

Perspective 1: Module/File Overview

Perspective 2: Variables, Functions, and Classes

Perspective 3: Implementation Details

Integration with Supabase and pg_vector

  1. Set Up pg_vector on Supabase:
    • Enable pg_vector on Supabase instance.
    • Create a table to store code vectors.
  2. Store Vectors:
    • Insert vectors into the Supabase table using the structure, components, and complexity metadata.
  3. Search Functionality:
    • Implement similarity queries using pg_vector to find relevant code snippets or modules based on vector representations.

By following this comprehensive approach, you ensure that the codebase analysis is rich in context and detail, improving the system's ability to provide accurate and helpful insights.

mouimet-infinisoft commented 1 month ago

Enhanced Plan and Implementation

Step-by-Step Plan

  1. Setting Up the Environment

    Ensure you have the necessary packages installed:

    npm install @babel/parser @babel/traverse @supabase/supabase-js
    pip install transformers torch radon
  2. Perspective 1: Module/File Overview

    This perspective involves analyzing the overall structure of the codebase, including files, folders, imports, and metadata.

    const fs = require('fs');
    const path = require('path');
    const babelParser = require('@babel/parser');
    const traverse = require('@babel/traverse').default;
    
    const analyzeModule = (filePath) => {
     const content = fs.readFileSync(filePath, 'utf-8');
     const ast = babelParser.parse(content, { sourceType: 'module', plugins: ['typescript'] });
     const metadata = {
       name: path.basename(filePath),
       description: '',
       responsibilities: [],
       dependencies: [],
       inputOutput: [],
       errorHandling: '',
       configuration: '',
       testing: '',
       versioning: '',
       security: '',
       performance: '',
       changelog: '',
       documentation: '',
       maintainers: '',
       license: ''
     };
    
     traverse(ast, {
       ImportDeclaration({ node }) {
         metadata.dependencies.push(node.source.value);
       }
     });
    
     return metadata;
    };
    
    const moduleToVector = async (moduleMetadata) => {
     const textRepresentation = JSON.stringify(moduleMetadata);
     const { tokenizer, model } = await loadModel();
     const inputs = tokenizer(textRepresentation, { return_tensors: 'pt', truncation: true, padding: true });
     const outputs = await model(inputs);
     const vector = outputs.last_hidden_state.mean(dim=1).detach().numpy();
     return vector;
    };
    
    const loadModel = async () => {
     const { AutoTokenizer, AutoModel } = require('transformers');
     const tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased");
     const model = AutoModel.from_pretrained("bert-base-uncased");
     return { tokenizer, model };
    };
    
    const rootDir = 'path/to/codebase';
    const files = fs.readdirSync(rootDir).filter(file => file.endsWith('.js') || file.endsWith('.ts'));
    for (const file of files) {
     const moduleMetadata = analyzeModule(path.join(rootDir, file));
     const moduleVector = await moduleToVector(moduleMetadata);
     await storeVector(file, 'module', moduleVector);
    }
  3. Perspective 2: Variables, Functions, and Classes

    This perspective focuses on the internal components of the code such as functions, classes, variables, and their relationships.

    const analyzeCodeComponents = (filePath) => {
     const content = fs.readFileSync(filePath, 'utf-8');
     const ast = babelParser.parse(content, { sourceType: 'module', plugins: ['typescript'] });
     const components = [];
    
     traverse(ast, {
       FunctionDeclaration({ node }) {
         components.push({
           type: 'Function',
           name: node.id.name,
           startLine: node.loc.start.line,
           endLine: node.loc.end.line,
           arguments: node.params.map(param => param.name),
           returnType: node.returnType ? node.returnType.typeAnnotation.type : 'void'
         });
       },
       ClassDeclaration({ node }) {
         components.push({
           type: 'Class',
           name: node.id.name,
           startLine: node.loc.start.line,
           endLine: node.loc.end.line,
           methods: node.body.body
             .filter(member => member.type === 'MethodDefinition')
             .map(method => ({
               name: method.key.name,
               startLine: method.loc.start.line,
               endLine: method.loc.end.line,
               arguments: method.value.params.map(param => param.name),
               returnType: method.value.returnType ? method.value.returnType.typeAnnotation.type : 'void'
             }))
         });
       },
       VariableDeclarator({ node }) {
         components.push({
           type: 'Variable',
           name: node.id.name,
           startLine: node.loc.start.line,
           endLine: node.loc.end.line,
           dataType: node.init ? node.init.type : 'undefined'
         });
       }
     });
    
     return components;
    };
    
    const componentsToVector = async (components) => {
     const componentsMetadata = JSON.stringify(components);
     const { tokenizer, model } = await loadModel();
     const inputs = tokenizer(componentsMetadata, { return_tensors: 'pt', truncation: true, padding: true });
     const outputs = await model(inputs);
     const vector = outputs.last_hidden_state.mean(dim=1).detach().numpy();
     return vector;
    };
    
    const filePath = 'path/to/file.js';
    const components = analyzeCodeComponents(filePath);
    const componentsVector = await componentsToVector(components);
    await storeVector(filePath, 'components', componentsVector);
  4. Perspective 3: Implementation Details

    This perspective analyzes control flow, cyclomatic complexity, and dependencies within the code.

    const { default: complexity } = require('escomplex');
    
    const analyzeComplexity = (filePath) => {
     const content = fs.readFileSync(filePath, 'utf-8');
     const analysis = complexity.analyzeModule(content);
     const complexityMetrics = {
       startLine: 1,
       endLine: content.split('\n').length,
       complexity: analysis.aggregate.complexity.cyclomatic,
       maintainability: analysis.aggregate.maintainability,
       dependencies: analysis.dependencies.map(dep => dep.path)
     };
    
     return complexityMetrics;
    };
    
    const complexityToVector = async (complexityMetrics) => {
     const complexityMetadata = JSON.stringify(complexityMetrics);
     const { tokenizer, model } = await loadModel();
     const inputs = tokenizer(complexityMetadata, { return_tensors: 'pt', truncation: true, padding: true });
     const outputs = await model(inputs);
     const vector = outputs.last_hidden_state.mean(dim=1).detach().numpy();
     return vector;
    };
    
    const complexityMetrics = analyzeComplexity(filePath);
    const complexityVector = await complexityToVector(complexityMetrics);
    await storeVector(filePath, 'complexity', complexityVector);

Integrating with Supabase and pg_vector

  1. Set Up pg_vector on Supabase

    Ensure pg_vector is enabled on your Supabase instance:

    CREATE EXTENSION IF NOT EXISTS vector;
  2. Create a Table for Vectors

    Create a table to store the code vectors:

    CREATE TABLE code_vectors (
       id SERIAL PRIMARY KEY,
       file_path TEXT,
       perspective TEXT,
       vector vector(768) -- Adjust the dimension based on the vector size from your model
    );
  3. Store Vectors in Database

    Insert the vectors into the database using Supabase:

    const { createClient } = require('@supabase/supabase-js');
    
    const supabaseUrl = 'https://your-supabase-url.supabase.co';
    const supabaseKey = 'your-supabase-key';
    const supabase = createClient(supabaseUrl, supabaseKey);
    
    const storeVector = async (filePath, perspective, vector) => {
     const { data, error } = await supabase
       .from('code_vectors')
       .insert([
         { file_path: filePath, perspective: perspective, vector: vector }
       ]);
     if (error) console.error(error);
     else console.log('Vector stored:', data);
    };
    
    // Store vectors for each perspective
    await storeVector('path/to/file.js', 'structure', structureVector);
    await storeVector('path/to/file.js', 'components', componentsVector);
    await storeVector('path/to/file.js', 'complexity', complexityVector);
  4. Search Using Vectors

    Implement a search functionality using pg_vector for similarity queries:

    SELECT * FROM code_vectors
    ORDER BY vector <-> '[your_vector_representation]'
    LIMIT 5;

Conclusion

Including detailed metadata from the AST for each perspective enhances the vector representations and captures more context about the codebase. This method makes the vectors more informative, improving the AI's ability to provide context-aware suggestions and collaborations.

mouimet-infinisoft commented 1 month ago

Specification for Enriching Code Analysis and Representation with Git History

Objective: Enhance the analysis and representation of code by incorporating comprehensive metadata and detailed context from the Abstract Syntax Tree (AST), along with the entire Git history, including commits, pull requests (PRs), change requests, merge discussions, and treating this as a time series data. This will provide a richer, more contextual understanding of the codebase evolution over time.

Perspective 1: Module/File Overview

Perspective 2: Variables, Functions, and Classes

Perspective 3: Implementation Details

Perspective 4: Git History and Time Series Data

Integration with Supabase and pg_vector

  1. Set Up pg_vector on Supabase:

    • Enable pg_vector on Supabase instance.
    • Create a table to store code vectors and Git history data.
  2. Store Vectors and Git History:

    • Insert vectors into the Supabase table using the structure, components, complexity metadata, and Git history data.
    • Include time series data representing the evolution of the codebase.
  3. Search Functionality:

    • Implement similarity queries using pg_vector to find relevant code snippets, modules, or historical changes based on vector representations and time series analysis.