yamadashy / repomix

📦 Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, and Gemini.
MIT License
4.36k stars 202 forks source link

docstrings in python files are not removed #97

Closed Malakof closed 1 month ago

Malakof commented 1 month ago

This section of doc (enclosed by """) can sometimes be out of sync from the current code during debugging process. As such when you feed it to the model, it can produce weird results. As a workaround, """[\s\S]*?""" regex Find and Replace in IDE does the job.

yamadashy commented 1 month ago

@Malakof Thank you for reporting this issue! I'll check it out when I get back to home.

yamadashy commented 1 month ago

@Malakof I'd like to clarify a few points about the output.removeComments functionality:

  1. Am I correct in understanding that you're observing the output.removeComments feature is not working for Python docstrings?

  2. Our current implementation should remove simple docstrings like these:

    '''
    docstring
    '''
    
    """
    docstring
    """
  3. We intentionally do not remove string literals assigned to variables, such as:

    var = """
    string variable
    """

If you're encountering cases where docstrings are not being removed as expected, could you please provide specific examples? This would greatly help us identify and address any issues with our current implementation.

Thank you for your help in improving Repopack!

Malakof commented 1 month ago

Sure, here is a snippet. # comments are removed but not docstrings. Repopack v0.1.36

def train_pipeline(data_path, model_path, encoder_path):
    """
    Orchestrates the machine learning pipeline from data loading to model saving.
    Returns evaluation metrics or raises an exception with a detailed error message.
    Args:
        data_path (str): Path to the dataset CSV file.
        model_path (str): Path where the trained model will be saved.
        encoder_path (str): Path where the encoder will be saved.
    Returns:
        dict: A dictionary containing various evaluation metrics if successful.
    Raises:
        Exception: An exception with a detailed error message if the pipeline fails.
    """
    try:
        df_base = _load_data(data_path)
yamadashy commented 1 month ago

@Malakof Thank you for providing the snippet. I appreciate your help in identifying this issue.

I'm glad to inform you that this was a known bug in version 0.1.36 and earlier, which has been fixed in version 0.1.37. You can find more details about the fix in our release notes: https://github.com/yamadashy/repopack/releases/tag/v0.1.37

npm update -g repopack

After updating, please try running Repopack again on your Python files. The docstrings should now be properly removed when output.removeComments is set to true.

Thank you for your patience and for bringing this to our attention!

yamadashy commented 1 month ago

As this issue has already been resolved, I'm closing it. If anyone encounters any related problems or has further questions, please feel free to open a new issue.

Thank you again for your contribution to improving Repopack!