possee-org / genai-numpy

MIT License
4 stars 6 forks source link

Task: Locate missing docstrings #8

Closed bmwoodruff closed 1 month ago

bmwoodruff commented 2 months ago

We'll need something that helps us determine which functions in numpy are missing docstrings, examples, etc. Each files in numpy often contain many function, so such a tool could start as follows.

  1. Input a file.
  2. Search the file and extra a dataframe that contains (1) the list of functions in that file, (2) if the function has docstrings or not, (3) how many examples are in the docstrings, (4) whatever else might be useful. Then at some point, we would run this across the entire numpy code base. In a such a run, we would want to add the path and name of the file. This will help us identify places to start using AI to generate doc strings.

If this sounds like something you would like to do, please add a comment below.

bmwoodruff commented 1 month ago

I started on this and have created a notebook that helps identify functions/methods with missing docstrings. See

It's not perfect, but it will generate a .csv which lists the functions/methods and how long the docstrings are, as well as examples, length of code, and relative path. I've done some initial filtering (excluding functions that begin with or test or reside in a test directory, excluding functions that take up less than 10 lines of code, etc.) which will probably need to be changed. The notebook currently only searches .py files, and will probably need to include .pyx files and more.

One of the goals is to generate docstrings for functions that are missing them, and I'm sure that more filtering could be done to help us refine our search.

bmwoodruff commented 1 month ago

Here's a link to the GPT4 conversation used to create the notebook.

bmwoodruff commented 1 month ago

I'm going to close this task. The relevant links and discussion are in Zulip and on Numpy.