Task: Tracking time for human review of entire `ma.linalg` section.

possee-org / genai-numpy

MIT License

4 stars 6 forks source link

Description:

I want to know how long it takes for a human to review a module with 30 functions. We need some data point(s) on how much human time it take to review AI Gen examples.

Create a new numpy branch.
Run example-post-processing.py on the np.linalg module.
Use VS code source control mode to quickly view the new injected code.
Adapt examples with errors to fix issues. Delete the example if no revision quickly fixes the problem. Any Jupyter notebook running a development version of NumPy can quickly help with this. Using tools/example-checker.ipynb is not required, but may speed things up. I'll avoid that for now.
For each function, make a judgement call on if examples are useful additions or can be deleted. Delete those that clearly don't contribute new things. For example, running code on a 2 by 2 matric, then a 3 by 3 matrix, does not add anything. However, running code on a 2 by 2, and then a stack of 3 by 3 matrices (multiple at once) does seem useful. Changing parameters can be a useful addition (depends). You have to make your own judgement call. I don't think all OSS maintainers will make the same call on this either.
Run all tests, including spin lint which will help identify lines that are too long. Adjust those lines appropriately.
Use git to add, commit, and push the files up to a branch on my fork.
Include a link to that fork here.
Record the time from start to end of how long revision took.

Acceptance Criteria:

[x] I've posted a link to the new branch with the revisions, and the time needed for review.

I don't care to post a link to the branch. it took about 30 minutes (while distracted) and then I had to run the tests. I missed a few # may vary tags, which took a bit of extra time to clean up at the end (an extra spin build...), and never checked the docs visually. I'm guessing this would add another half an hour. So for 30 functions, it may be anywhere from 2-3 minutes for function to review. Several didn't end up getting any examples added.

With 829 functions, this could end up being around 2000 minutes (or 30 hours give or take) of human review. With 2 interns working on this, it would take 1 week of work assuming they can go the same speed. I'd guess 2 weeks perhaps of human review, followed by a quicker glance from the tech lead (hopefully 1/4 the time), and maintainer (another 1/2 that, maybe 4 hours). This could add 1500 examples into the codebase.

Building numpy, installing it, and spinning the docs, adds an extra chunk of time for each batch (which can happen in the background).

Right now the bigger issue that I need to deal with is the way that docstrings are handled for aliases, and overwritten using doc_note and various other classes throughout the codebase. This makes it problematic to algorithmically deal with docstrings.

possee-org / genai-numpy

Task: Tracking time for human review of entire `ma.linalg` section. #106