As I'm running the automation tools on various modules in the code base, I'm finding issues that need to be addressed. Some of those issues go all the way back to possibly before using Llama3. We may need to revise the example generator function. We may just need to focus docstring handling in NumPy itself.
Some callable items are defined by simply new_item = other_item, and that's it. For these, all docstring information is inherited from the other_item. See np.ma.inner.
Some callable items have their docstring set new_item._doc__ = other_item.__doc__. When this happens, hard coded escape characters are lost. So something like \" becomes '' and \\ becomes \. There is no way to recover the first perfectly, because not all " have escape characters on them. For these items, regardless, maybe they should just be skipped in example generation entirely (or maybe linked to their main function in some way). See np.ma.apply_along_axis.
Some callable items in ma use the custom doc_note function that essentially is new_item._doc__ = other_item.__doc__, except that it adds something to the "Notes" section of the other docstring, for this function. See np.ma.innerproduct.
I'm guessing there are more exception. I want to compile a list of these types of exceptions, and create an algorithmic way to organize them all. One way to quickly find the exceptions is to compare the outputs of using np.mod.func.__doc__ (which returns the formatted docstring obeying all the new assignments) and inspect.get_source() (see the last two prompts to GPT-4o).
I'm figuring at this point, these functions should be skipped entirely in example generation. I also want to discuss this in the documentation meeting.
I'm wondering if some of the newer functions should be documented using some of these options (such as the doc_note option for new linalg functions where the only addition was "This is an Array API compatible version of ...". Maybe we even have the docstring move to linalg and put the doc_note on the base function, update the example to all use np.linalg.... and then encourage that change for all moving forward. This will need to be discussed at the documentation meeting.
I know other functions have "This is an alias of ..." in their docstring, and then not much else. Perhaps these functions need to have a doc_note or just replaced docstring, and the current docstring deleted (or suitable replacement added).
Adding new content is not always the best approach. Sometimes condensing and reorganizing what is needed
Acceptance Criteria:
[ ] Compile a comprehensive list of docstring assignment methods where docstrings are overwritten.
[ ] For each assignment method, list all the functions in the codebase that use that assignment method.
[ ] Identify all function that have the word "alias" in it, and then identify which are actual aliases of another function.
[ ] Discuss findings with documentation team and make a plan for future work.
Description:
As I'm running the automation tools on various modules in the code base, I'm finding issues that need to be addressed. Some of those issues go all the way back to possibly before using Llama3. We may need to revise the example generator function. We may just need to focus docstring handling in NumPy itself.
new_item = other_item
, and that's it. For these, all docstring information is inherited from theother_item
. Seenp.ma.inner
.new_item._doc__ = other_item.__doc__
. When this happens, hard coded escape characters are lost. So something like\"
becomes''
and\\
becomes\
. There is no way to recover the first perfectly, because not all"
have escape characters on them. For these items, regardless, maybe they should just be skipped in example generation entirely (or maybe linked to their main function in some way). Seenp.ma.apply_along_axis
.doc_note
function that essentially isnew_item._doc__ = other_item.__doc__
, except that it adds something to the "Notes" section of the other docstring, for this function. Seenp.ma.innerproduct
.I'm guessing there are more exception. I want to compile a list of these types of exceptions, and create an algorithmic way to organize them all. One way to quickly find the exceptions is to compare the outputs of using
np.mod.func.__doc__
(which returns the formatted docstring obeying all the new assignments) andinspect.get_source()
(see the last two prompts to GPT-4o).I'm figuring at this point, these functions should be skipped entirely in example generation. I also want to discuss this in the documentation meeting.
np.linalg....
and then encourage that change for all moving forward. This will need to be discussed at the documentation meeting.Adding new content is not always the best approach. Sometimes condensing and reorganizing what is needed
Acceptance Criteria: