possee-org / genai-numpy

MIT License
4 stars 6 forks source link

Task: Generate examples for functions with an axis parameter (open idea) #116

Open bmwoodruff opened 3 weeks ago

bmwoodruff commented 3 weeks ago

Description:

At the triage meeting, one place where additional examples could be welcome is functions that have an axis element. The axis parameter can be tricky (and many of the AI examples generated so far have completely gotten it wrong).

I bet we could design a better prompt that helps generate these types of examples, and includes an example to help understand the axis parameter.

Acceptance Criteria:

otieno-juma commented 2 weeks ago

I've created a script using groq and incorporated some security measures, such as masking github tokens using os environs.I have documented the script to explain what each item does. If anyone has suggestions for improvements or deletions, it would be greatly appreciated! I'm in the process of setting up my test environment. If you have any lack with it please share your findings.

"""
This script creates pull requests on a GitHub repository to add examples for NumPy functions that have an 'axis' parameter.

The script first sets up a GitHub instance using an environment variable for the GitHub token. It then retrieves all the NumPy functions that have an 'axis' parameter, and creates a dictionary to store the examples and explanations for each function.

The script then creates a pull request template that includes the function name, three examples, and their descriptions. It then creates a pull request for each function, with the title 'Add examples for {function_name}' and the pull request body filled in with the examples and descriptions.

The script waits for 10 seconds between creating each pull request to avoid rate limiting.

Finally, the script prints the number of pull requests that were created.
"""
import importlib
import numpy as np
import difflib
from github import Github
from github.GithubException import UnknownObjectException
import os
import time
import inspect
import github
from github.GithubException import UnknownObjectException
import git

# Set up environment variable for GitHub token
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")

# Check if the environment variable is set
if GITHUB_TOKEN is None:
    print("Error: GITHUB_TOKEN environment variable is not set")
    exit()

# Create a GitHub instance with the token
g = Github(GITHUB_TOKEN)

# Set up the repository and branch you want to create pull requests for
repo = g.get_repo("your-repo-name")
branch = repo.get_branch("main")

# Get all numpy functions with axis parameters
axis_params = []
for name in dir(np):
    obj = getattr(np, name)
    if hasattr(obj, "__call__") and "axis" in inspect.signature(obj).parameters:
        axis_params.append((name, obj))

# Create a dictionary to store the examples and explanations for each function
examples = {}

# Create the pull request template
pr_template = """
# Examples for {function_name}

{function_name} is used to perform various operations on arrays. Here are some examples of how to use it:

**Example 1:**
{example_code}

This code does {example_description}

**Example 2:**
{example_code}

This code does {example_description}

**Example 3:**
{example_code}

This code does {example_description}

Please let me know if you have any questions or need further clarification.
"""

# Create the pull requests
pr_count = 0
for func, obj in axis_params:
    # Create the examples and explanation for the function
    examples[func] = []
    for i in range(3):
        example_code = f"{func}(np.array([1, 2, 3]), axis={i})"
        example_description = f"Performing operation on axis {i}"
        examples[func].append((example_code, example_description))

    # Create the pull request
    pr_title = f"Add examples for {func}"
    pr_body = pr_template.format(function_name=func, **examples[func])
    pr = repo.create_pull(title=pr_title, body=pr_body, head=branch.name, base=branch.name)
    pr_count += 1

    # Wait for 10 seconds before creating the next pull request to avoid rate limiting
    time.sleep(10)

print(f"Created {pr_count} pull requests!")
bmwoodruff commented 2 weeks ago

When you have something ready remember you can submit a pull request to genai-numpy.

I'm worried your code may submit a PR rather than just push to a branch. We do not want to flood numpy with PRs.

bmwoodruff commented 2 weeks ago

How much feedback do you want at this stage? I'm guessing you need to play with things some more to see how it performs. Here's some initial thoughts.

I don't see any portion of the code above that generates examples using AI.

Do you understand the axis parameter? The example you gave example_code = f"{func}(np.array([1, 2, 3]), axis={i})" has only one axis, but you want to run a for loop over it using axis=2 and axis=3 (which will fail). If you're not sure what the axis parameter does, then you'll need to spend more time with things before you're ready to generate examples for this. The explanation portion needs to be more than just example_description = f"Performing operation on axis {i}". The point was to have a detailed description to help a user understand some potentially tricky axis choices. That will require more than a header.

otieno-juma commented 2 weeks ago

Thank you, Ben, for the feedback. Having these conversations helps me understand what needs to be done and how to proceed in the right direction. i'm currently experimenting with different approaches. I was using this issue https://github.com/swcarpentry/python-novice-inflammation/issues/906 as a reference to understand the axis parameter. Are there other resources that I can get to help me better understand the axis parameter?

bmwoodruff commented 2 weeks ago

Looks like a great resource. Keep searching the web. Have you used AI (GPT, Groq, Claude, etc.) to help you? I would hope that through a series of conversations with AI you can come to a better understanding, and even figure out a prompt, or series of prompts, that can help you generate examples with AI. That's our goal, namely find ways to use AI to help improve NumPy. Maybe the examples will need a good story around them (the one you pointed to has such a story).

Based off recent discussions on your PRs, when you do find a way to prompt AI to generate the needed examples, look to see if the examples it generates come from somewhere on the web. Keep a record of your prompts (GPT does that automatically - for better or worse, but Groq will require you keep track of the prompts yourself).