microsoft / gather

Spit shine for Jupyter notebooks 🧽✨
https://microsoft.github.io/gather
MIT License
532 stars 38 forks source link

Make gathering conservative by default #11

Closed andrewhead closed 5 years ago

andrewhead commented 5 years ago

Is your feature request related to a problem? Please describe.

Currently, the slicer assumes that methods don't modify their arguments. While this assumption is often correct, sometimes it's not. And when methods do modify their arguments, the gathered notebook will be missing these methods, and hence code needed to reproduced a result.

Describe the solution you'd like

Basically, more accurate slicing, that's more likely to gather code that might not be needed than to leave it out.

For the exact implementation, I suggest modifying the slicer to assume that:

  1. Methods change their arguments, unless otherwise noted
  2. Methods change the objects they're called on, unless otherwise noted

And providing an easy way for users to specify when methods don't modify their arguments. For example, they could provide a lightweight configuration file that looks like:

[
  {
    "obj-name": "m",
    "function-name": "fit",
    "does-not-modify": ["OBJECT"]
  }, {
    "function-name": "clean_data",
    "does-not-modify": [0, "auxiliary_data"],
  }
]

That is, a user could specify function calls that modify their arguments by the function-name, optionally the obj-name or name of the object the function, and by a list of what the function modifies. This can be either the object the function was called on ("OBJECT"), positional arguments (e.g., 0 for the first argument), or keyword arguments (e.g., an argument named auxiliary_data).

The user could specify these rules of which methods don't modify their arguments in a Jupyter Lab setting editor. This could be populated with some defaults (e.g., some common Pandas data frame methods like df.head() and df.describe())

Describe alternatives you've considered

The slicer could be improved to infer when functions modify their arguments. This would take some engineering effort that's not currently available.

The current implementation of the tools assume that methods don't modify their arguments. I worry that this might make the tool unusable as by default a lot of relevant code might be missing from slices.

andrewhead commented 5 years ago

The initial implementation of slicing is finished as of 1f6d626e132f779629a6c0d16037665659327ffc.