To address the issue of expanding the scope to include non-code files for comprehensive repository understanding, localization, and change suggestion, we need to make changes across several components of the system. The changes will ensure that non-code files such as documentation, configuration files, deployment scripts, and notebooks are considered alongside code files. Below are suggested modifications for each relevant file:

1. `file_analyzer.py`

Update the generate_semantic_description function to handle different file types.
Modify the prompt to generate semantic descriptions for non-code files.

def prompt_generate_semantic_description(content, file_type):
    """ Prompt for generating semantic description of a file based on its type. """

    file_description_map = {
        'python': 'python file',
        'markdown': 'markdown document',
        'yaml': 'YAML configuration',
        'json': 'JSON configuration',
        'notebook': 'Jupyter notebook'
        # Add more file types as needed
    }

    file_description = file_description_map.get(file_type, 'file')

    prompt = f"""
Understand the following {file_description} and generate a semantic description for it in markdown format.

{content}


Generated document should follow this structure:

Semantic Summary

A brief semantic summary of the entire file. This should not exceed 100 tokens.

Structures

List of relevant structures, sections, or components in the file with a brief semantic summary for each. Individual summaries should not exceed 50 tokens.


"""
    return prompt

def generate_semantic_description(filepath):
    """
    Generate a semantic description for a file using LLM.
    Args:
        filepath (str): The path to the file.
    Returns:
        str: The generated semantic description.
    """
    if os.path.getsize(filepath) == 0:
        return None

    # Determine file type (e.g., by extension)
    _, file_extension = os.path.splitext(filepath)
    file_type = file_extension.lstrip('.')

    with open(filepath, 'r') as file:
        content = file.read()

    prompt = prompt_generate_semantic_description(content, file_type)
    return extract_code_block_content(
        call_llm_for_task(
            task_name=TaskName.GENERATE_FILE_SUMMARY,
            messages=[
                {"role": "system", "content": "You are an expert on generating semantic descriptions for various file types."},
                {"role": "user", "content": prompt}
            ]
        ).content
    )

2. `change_suggester.py`

Ensure that the change suggestion process considers non-code files.
Update the prompt to include context from all types of files when suggesting changes.

def prompt_generate_change_suggestions(issue_analysis, file_suggestions, files):
    """ Generates the prompt to localize issue to specific files and suggest changes. """

    messages = []

    system_message = {
        'role': 'system',
        'content': f"""You are an AI assistant that specializes in analysing issues and understanding various types of files, and make change suggestions to address issues.

Following files have been suggested as relevant to the issue and discussion:

[FILE-SUGGESTIONS-START]
{file_suggestions}
[FILE-SUGGESTIONS-END]

Here are the corresponding files:
{files}

Based on the issue details and ensuing discussion please suggest changes in these files and (or any new code) along with your reasoning. Consider the context provided by all types of files."""
    }
    messages.append(system_message)

    conversation = issue_analysis.get('conversation', [])
    for message in conversation:
        role = 'user' if message['role'] == 'user' else 'assistant'
        messages.append({'role': role, 'content': message['content']})

    return messages

3. `issue_analyzer.py`

Ensure that the issue analysis includes context from non-code files, if applicable.

No specific changes needed unless specific file content should be included in the issue conversation context.

4. `package_summary.py`

Update the package summary generation to include non-code files.

def prompt_generate_package_summary(package_name, documentation):
    """ Generates the prompt for summarizing a package, considering both code and non-code files. """

    prompt = f"""
Understand the following hierarchical documentation for package {package_name}, with semantic description of sub-packages, files, classes, functions, and other structures contained.

```markdown
{documentation}

Now generate an abstractive package summary in markdown format with the following structure:

# <Package Name>

## Semantic Summary
A very crisp description of the full package semantics. This should not exceed 150 tokens.

## Contained structure names
Just a comma-separated listing of contained sub-package, file, class, function, structure, or document names. E.g.,
`package1`, `sub_package`, `file_name.py`, `ClassName`, `function_name`, `doc.md`, `config.yaml` ...

Note: Whole package summary should not exceed 512 tokens. """ return prompt


### 5. `localizer.py`
- Update the localization logic to include non-code files.
- Adjust the file localization suggestion logic to accommodate non-code files.

```python
class FileLocalizationSuggestion(BaseModel):
    package: str
    file: str
    confidence: float
    reason: str

def prompt_localize_to_files(issue_analysis, package_details):
    """ Generates the prompt to localize issue to specific files, including non-code files. """

    messages = []

    system_message = {
        "role": "system",
        "content": f"""You are an AI assistant that specializes in localizing issues to related files based on semantic summaries of packages and files including non-code files.

You return files that are most relevant to the issue in the following JSON format:

```json
{{
  "file_localization_suggestions": [
    {{
      "package": "<Fully qualified package name>",
      "file": "<Name of the file>",
      "confidence": <a floating point number between 0 and 1 with two decimal points indicating the confidence in the suggestion>,
      "reason": "<An explanation of the relevance of this file for the issue (not to exceed 50 tokens)>"
    }}
  ]
}}

Following are the semantic summaries of the files (and their containing packages) that you can refer to:

{package_details}

DO NOT TRY TO SOLVE THE ISSUE. JUST LOCALIZE IT TO THE MOST RELEVANT FILES AND RETURN THE file_localization_suggestions JSON OBJECT. """ + FILE_LOCALIZATION_SUGGESTIONS_FORMAT_INSTRUCTIONS } messages.append(system_message)

conversation = issue_analysis.get('conversation', [])
for message in conversation:
    role = 'user' if message['role'] == 'user' else 'assistant'
    messages.append({'role': role, 'content': message['content']})

return messages



These changes will ensure the system can process and consider a broader range of files when analyzing issues, localizing them, and suggesting changes. This comprehensive approach will help provide a more holistic understanding of repository contexts and necessary modifications.

pdhoolia / se-agent

Expand Scope to Include Non-Code Files for Comprehensive Repository Understanding, localization, and change suggestion #15

1. `file_analyzer.py`

Semantic Summary

Structures

2. `change_suggester.py`

3. `issue_analyzer.py`

4. `package_summary.py`

Following are the semantic summaries of the files (and their containing packages) that you can refer to:

{package_details}

pdhoolia / se-agent

Expand Scope to Include Non-Code Files for Comprehensive Repository Understanding, localization, and change suggestion #15

1. file_analyzer.py

Semantic Summary

Structures

2. change_suggester.py

3. issue_analyzer.py

4. package_summary.py

Following are the semantic summaries of the files (and their containing packages) that you can refer to:

{package_details}

1. `file_analyzer.py`

2. `change_suggester.py`

3. `issue_analyzer.py`

4. `package_summary.py`