microsoft / gather

Spit shine for Jupyter notebooks 🧽✨
https://microsoft.github.io/gather
MIT License
532 stars 38 forks source link

Idea: support gathering functions at point of definition #16

Closed micahjsmith closed 5 years ago

micahjsmith commented 5 years ago

Can functions be gathered from the point they are defined?

In the following example, f cannot be gathered by clicking on the function name or on the cell in which it is defined.

a = 1
# new cell
def f():
    return a+1

However, if I now assign f to a variable, that variable can gathered, which collects f and a as well as expected.

a = 1
# new cell
def f():
    return a+1
# new cell
b = f

Desired behavior is that after a function definition is evaluated, the function name (i.e., f) is highlighted indicating it is available for gathering.

andrewhead commented 5 years ago

Hi @micahjsmith! You're right that the only way to gather a function right now is to assign its return value to a variable or to display the output of the call, and gather from the variable or output.

I'm open to the idea of gathering on a function, with a bit more context. Can you propose an interaction flow that would work in your case? Are you thinking like, right click on a function? Or have the function defs highlighted like the variables? And what do you want to happen with the gathered code---just what's already happening?

A few examples of functions you want to gather in existing notebooks and the code you'd want gathering could also be helpful to us making sure the slicer would handle 'em appropriately.

Whether we integrated this would also depend on if we thought functions were a generally useful thing to gather for other users. Do you have a sense of that from your own work?

andrewhead commented 5 years ago

@micahjsmith and I discussed this topic further over remote chat. I think what we decided was that this functionality for this could be put into another plugin.

Meanwhile, it could be useful to make nbgather extensible to help others build plugins like that. What if we could add an interface that lets others register new gather commands? When registering a new gather command, the caller could provide a command title, an icon, and a callback that will get triggered and provided with the gathered cells. I'm opening this suggestion in a separate issue.

micahjsmith commented 5 years ago

I've now dived a bit into the parser/slicer and hope I can clarify what I meant with this issue originally.

It seems that after a cell is executed, the controller will cause "defs" in the cell to be highlighted and those editor defs are added to the model. These defs are now selectable/available to be gathered. I see at https://github.com/microsoft/gather/blob/master/src/overlay/gather-markers.ts#L351 that defs in the parsed cell program are filtered to see if they are SymbolType.VARIABLE. So a statement like a = lambda x: 1 is a "def" of symbol a which is a variable.

In contrast, in a statement like def a(): return 1, symbol a is marked as SymbolType.FUNCTION and is not added to the model as an available EditorDef.

What I am proposing is also marking defs of type SymbolType.FUNCTION and SymbolType.CLASS as EditorDefs available to be gathered.

The intuition is that if I can define a function or class, and then assign that object to a variable, which can then be gathered, then the original object should also be available to be gathered.

Can you propose an interaction flow that would work in your case?

Yes, treat function and class symbols identically to variable symbols. The identifier would be highlighted and would be available to be gathered.

A few examples of functions you want to gather in existing notebooks

One example is what I posted in the original post on this thread where function def f should be gatherable even if it is not assigned to a new variable.

Whether we integrated this would also depend on if we thought functions were a generally useful thing to gather for other users. Do you have a sense of that from your own work?

For sure, suppose I am working on a data analysis project in my notebook that produces several figures and tables as output. Now I want to distribute my analysis in a reproducible manner. I would go about this by creating a new function main:

def main():
    make_figure_1()
    make_figure_2()
    make_table_1()

I now would gather this function (by selecting the identifier at point of definition) to a new script, add a cursory if __name__ == '__main__': main(), and thanks to the magic of gather, I can share this script which can reproduce my results when run at the shell.

So ultimately unrelated to the issue of different backends :) but I know you have already closed this issue -- not a problem. I would hope that this change is actually not very involved, so perhaps if I am able to get a patch working for it I would ask if you wanted to reopen the issue and review a PR.

andrewhead commented 5 years ago

Hey @micahjsmith thanks so much for this detailed write-up! I appreciate the use case and agree that the use case of creating a cursory main and gathering it up sounds pretty slick.

I would hope that this change is actually not very involved, so perhaps if I am able to get a patch working for it I would ask if you wanted to reopen the issue and review a PR.

Sounds good to me, have at it! I think it should be a pretty straightforward fix, though let me know if you could use any pointers or a chat.

By the way, did you know that you can shift+click multiple results and in that way gather and merge together the code that produced multiple figures? Would that handle your use case above, or are you thinking that the preferred way to express that in your case is to define a parent function of the important results and gather that function?