jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.59k stars 4.86k forks source link

Raise an error on a reference to a nonexistent object (deleted/renamed in the code) #5270

Open mirekphd opened 4 years ago

mirekphd commented 4 years ago

After name change an object such as function ceases to exist in the code, yet the environment perserves its old name, so it will continue to work.

It's a bug, as only new name should be preserved after object name change, thus preventing a very common error of referring in the code to non-existing objects (a problem that cannot be currently spotted without python kernel restart and re-running entire notebook code, which can sometimes take very long).

If the contents of the object (such as function) change and the name is preserved, then the situation becomes more difficult to spot, but here we are talking only about name changes alone, which can be caught early, making notebook programming more robust.

kevin-bates commented 4 years ago

Hi @mikephd,

I think this is the general nature of REPLs. If I go into the python REPL and create a function, execute it, then change the function name, the REPL sees that as a new function, not a renamed function. I haven't looked, but I don't think there's a way to say, "I'm really modifying this function here.". Active notebook sessions should be viewed the same way, the kernel is essentially a REPL relative to its programming language.

One thing I see is that this appears to be an issue of scope. If I move my function into a class, the class becomes the outer scope. Attempts to run the original method that has been renamed within the class result in the older method being not found. However, if I were to change the class name, both the old and new class definitions are still "in scope". Perhaps you can introduce scope, with the idea that the outer-scoped objects are subject to this behavior, while each of their methods adhere to the expected behaviors of traditional program execution?

Related to: #5263

mirekphd commented 4 years ago

Hi @kevin-bates,

I understand the root of the issue - a necessary price to pay for interactivity, for not having to import everything from the start, including large data sets at every tiny change in the code. We did try to introduce an "alternative" to Jupyter's REPL once (too much advertising I suppose) - VS Code (in a semi-containerized form of cdr/code-server, half-backed, lacking features such as upload / download for instance) and boy, wasn't it pain! While there is an extension for VS Code that allows one to run selected fragments of python code, the IDE still lacks the concept of variable persistence in an environment. So if your multi-gigabyte dataset loads from the database for some 10 minutes, at an early stage of your modeling pipeline, you would have to incur this penalty at every tiny change in your code... it quickly turns out to be impractical - REPL wins hands down.

I think it would be hard to refine how REPL works without changing variables scoping in python itself. For instance one cannot prevent global scoping of variables defined outside of a function (anywhere in the notebook, even if variable gets immediately deleted). What happens sometimes is that you change a variable name inside a function, but not quite completely, you leave one orphaned occurence of the old name... A silent fail will happen if you earlier declared this variable in your global environment (during quick tests before writing proper function) and now it sits orphaned inside of the function as the only occurence and keeps being modified, yielding unexpected results (or more likely not yielding expected ones). Am I right in thinking that it is a language feature that makes it impossible to restrict access of python functions to locally defined variables, and preventing access to global ones (that may no longer exist, having been renamed)?

kevin-bates commented 4 years ago

Am I right in thinking that it is a language feature that makes it impossible to restrict access of python functions to locally defined variables, and preventing access to global ones (that may no longer exist, having been renamed)?

That's my hunch, but I would prefer to defer (do you like that? :smile:) to someone like @Carreau (or most anyone else) who's been working with Python much longer than myself.

Carreau commented 4 years ago

Well I think it's not only a language issue, there is also the fact that what humans are doing is sometime ambiguous.

If you refactor a function enough, and change its name; it may be clear to the person doing so what you are trying to do, but not to the computer (or maybe even not either to another human).

We could try to treat the document as a whole and try to see which objects are defined in which cells, and when a cell change assume that the changes apply to all the object that were defined in such a cell, but it is a really narrow way of viewing things. Like if you split a cell in two, did you "delete" the objects ?

With the current way jupyter is working, the kernel really does not know that you have cells. Editing a cell is really just executing another input. So from the kernel POV there is no deletion or edit ever.

We could have some heuristics like with what %autoreload do; but as you both point out trying to automatically infer the computation graph, and rerun all the cells can be at odds with some workflows. If there is any progress on that front from linters that would be great; but that will probably never be in the core of Jupyter to have this be an error.