possee-org / genai-numpy

MIT License
4 stars 6 forks source link

Task: Automate processing of generated example logs #77

Closed bmwoodruff closed 1 month ago

bmwoodruff commented 3 months ago

Description:

With the generated example logs created (almost 1000), we need a way to automate processing them. This entails multiple things.

Note that some examples suggest an idea for an example, but the actual examples is garbage. For these examples, I'd like a special note that suggests human intervention to preserve the idea.

In other places, the way to call a function has preferred settings. For example,

Acceptance Criteria:

bmwoodruff commented 3 months ago

I'm thinking a report may not be needed. Just inject the code into the proper place in the codebase, while working on a branch, and let VS Code's side-by-side Souce Control be the report.

bmwoodruff commented 3 months ago

I'm working on building an example extractor to take out the new examples, and only the new examples, from the generated files. I figured I'd report a bit on the intermediate progress.

That's enough of a report for now. I wanted to keep track of what I'm doing. I think i have enough written to automate direct example injection into the codebase for the 590 functions (followed then by human review). We can do this one module at a time. I want to polish the scripts up first that will do the post processing (the functions are currently in examples/extract_new_exmaples.py). Once I get things polished up, I'll add docstrings, get rid of my silly debugging hacks, and then hopefully we can use them to inject a thousand or more examples into the codebase.

bmwoodruff commented 3 months ago

I got excited and wanted to share. I'm going to use the fuzzywuzzy package to do text comparisons.

bmwoodruff commented 3 months ago

Well, I got a lot of progress.

We have code now that does the following:

I think those are the key bits for an automated workflow. I'll work on polishing it up tomorrow.

bmwoodruff commented 3 months ago

I'm going to work on " Algorithmically locate the proper spot in numpy codebase to insert examples, and then insert examples." next.

Thoughts:

I think sending in 1000+ examples to be reviewed at once will not be wanted, but I think tackling an entire module (with the exception of ma and np) could be desirable. Then if at some point the devs want to remove the AI generated examples (maybe legal issues will hit us all), then it can be done easily.

A consistent commit message would be nice. Here is a proposed option adapted from a discussion with @otieno-juma:

DOC: AI-Gen examples for ...

Examples created by Llama3-70B. Reviewed and modified as part of POSSEE.
Co-authored-by: Ben Woodruff <bmwoodruff@gmail.com>
[skip actions] [skip azp] [skip cirrus]

Not sure if adding my name to all of them is needed. My thoughts are that this would provide a BLAME trail that includes me in addition to the interns, if someone wants more information.

bmwoodruff commented 3 months ago

PRs #98, #100, and #101 are all related to this task. #101 took forever, as I could not figure out why escape characters were being removed. I'll record what I learned here, as all solutions from AI were garbage (and hopefully it will find this content when trained in the future).

You can see the problem with disappearing escape sequences with the following minimal example.

content = 'String with \n new lines and \\\\ some \\ backslashes.\n We need a few more \\\\ to help \\ see the problem.'
old_phrase = 'String with \n new lines and \\\\ some \\ backslashes.'
new_phrase = 'String with \n new lines and \\\\ some \\ backslashes.'
pattern = re.compile(re.escape(old_phrase), re.MULTILINE)
new_content = pattern.sub(new_phrase, content)
new_content

The output string is below, and you can clearly see how the replaced content now has lost half the escape characters.

'String with \n new lines and \\ some \\ backslashes.\n We need a few more \\\\ to help \\ see the problem.'

To fix this, just replace all \\ with \\\\ in new_phrase before using pattern.sub. The updated code is

content = 'String with \n new lines and \\\\ some \\ backslashes.\n We need a few more \\\\ to help \\ see the problem.'
old_phrase = 'String with \n new lines and \\\\ some \\ backslashes.'
new_phrase = 'String with \n new lines and \\\\ some \\ backslashes.'
pattern = re.compile(re.escape(old_phrase), re.MULTILINE)
new_content = pattern.sub(new_phrase.replace('\\','\\\\'), content)
new_content

The output is now the correct string:

'String with \n new lines and \\\\ some \\ backslashes.\n We need a few more \\\\ to help \\ see the problem.'

Wasted almost a day on this ...

bmwoodruff commented 3 months ago

What's left?

These I think are crucial before it's ready to use:

These are polishing updates to work on after we get something working:

bmwoodruff commented 3 months ago

PRs #102, #103, #104 are also connected to this task.

As I'm wrapping up automation, I have some new thoughts.

Right now we have examples, lots of them, that we could add into the code base. Whether we add them or not was not the goal for this project, rather our goal was to get a Proof of Concept showing how it can be done. We can do it now.

Rather than inject these examples into the codebase, why not wait. We might as well inject examples for the functions that are missing examples, but we can postpone mass example inclusion for now. We have a POC. Now we can refine the prompt(s) and see if there is buy in from the maintainers.

I do think we should go through and clean a few modules up, with branches fully ready to be included in the main namespace, just so we know how much time that will take and we can showcase an example of the whole process. My thoughts there are:

  1. Have AI create a branch, generate examples for (a module, 20 functions, all of numpy, one at a time?) some number of functions, add all changes to that branch, commit the changes, and push the branch to a fork.
  2. Have a human review, delete, modify, etc., till the branch builds and passes all tests (ready for a PR). Squash changes (not to main, but to this first commit). This way it's completely transparent how humans revised the AI gen components.

This means that each branch would have 2 commits. The AI gen commits will most likely not pass tests.

bmwoodruff commented 1 month ago

I think it's better to not push to a branch instantly, rather not commit the changes so that it's simple to see what's been changed. I'll close this as done.