microsoft / gather

Spit shine for Jupyter notebooks 🧽✨
https://microsoft.github.io/gather
MIT License
532 stars 38 forks source link

use python-program-analysis #33

Closed rdeline closed 5 years ago

rdeline commented 5 years ago

This PR switches us over to the external program analysis library.

andrewhead commented 5 years ago

Heya @rdeline ^_^

This looks pretty good. I figure you may have done this already, though if you haven't, can you re-run the tests and try out gather in lab to make sure everything's still working?

My one stylistic question is about the rename of ICell to JupyterCell. I never figured out a good way to make it clear in the code which data structures belonged to Jupyter's code, and which ones to our own, and I feel like this may make it a bit more confusing. I wonder if there's another term we could use that suggests that the cell's "belong" to the python-program-analysis package. Like PythonCell, PPACell, or something in that vein. Anything come to mind?

rdeline commented 5 years ago

@andrewhead Your wish is my command. :)

I renamed JupyterCell to just plain Cell. (I didn't go back to the "I" prefix, because it would be the only exported type with that prefix, which felt inconsistent.) When we have more time (or a vibrant community!) we can revisit why there are so many *Cell types in python-program-analysis.

Yes, it passes the tests, and I gave it a quick 2 minute check in Lab, so the basics work. I thought maybe you'd have a standard set of interactions you use for checking.

andrewhead commented 5 years ago

I thought maybe you'd have a standard set of interactions you use for checking.

Sure! I typically select an output and then:

Did you cover those basics? If so, I'm ready to merge. If not, the "binder/Try out nbgather.ipynb" notebook is pre-loaded with a simple messy notebook with multiple revisions of the scatter plot for testing out gathering revisions, which you can either try out or I can :-)

rdeline commented 5 years ago

BTW, I just moved the class LabCell back into the Gather project. (Cell and AbstractCell remained in python-program-analysis.) LabCell dragged in a ton of Jupyter-specific modules, which failed to load properly in a VS Code extension (not surprisingly) and which only really make sense in a Jupyter context.

andrewhead commented 5 years ago

Also @rdeline, let me know if you'd like to do a call / remote pair debugging session to push this PR through. I realize it might be time sensitive.

rdeline commented 5 years ago

@andrewhead No repro. :( image Do you have a clean build? Could you create a failing unit test (in the python-program-analysis project, if appropriate)?

andrewhead commented 5 years ago

Good thinking. I cleaned up the code but was still seeing this issue. I did find something that at least partially breaks slicing in the port of the program analysis code. I added a test case in my branch of python-program-analysis here.

I think the issue happens right here. One issue is that some nodes in the AST don't have the location property defined for them (like nodes for array slices). This causes an error when we try to set node.location.path. We mark that as an error in analyzing the cell, and omit that cell from the output of programBuilder.buildTo, which results in slicing on an incomplete execution log.

rdeline commented 5 years ago

I've updated the parser (in python-program-analysis) to consistently produce a location for every syntax node. (That is if a node has a 'type' now it has a 'location'. There were a bunch of places before where this wasn't the case, including array slicing.) I've updated my PR to use the new parser.

Meanwhile, I don't understand your new test. You build to an executionId ("id") that none of the cells have. So I would expect the empty string. Then again, I haven't studied the code, so maybe I just don't get it.

andrewhead commented 5 years ago

Thanks Rob! And oops, you're right, that test makes no sense---the ID is all wrong. Good catch. I'm fixing it up and re-testing the extension with the updates as we speak.

andrewhead commented 5 years ago

Test for program builder bug updated here: https://github.com/andrewhead/python-program-analysis/blob/70cb94e8d99e4b700257a34c2cd083a4d688de68/src/test/program-builder.test.ts#L106-L111

rdeline commented 5 years ago

The new test passes in the latest python-program-analysis (and is checked in there). Yay?

andrewhead commented 5 years ago

Partial yay! There still seems to be an issue in the slicer with the many repeated cells in the example I shared from the "Try out nbgather" notebook above. I'm currently tracking down and am getting close to the source.

andrewhead commented 5 years ago

Hey @rdeline! I seem unable to spawn a clean version of this branch that doesn't have this weirdly conservative slice. It has been boggling my mind 🤯.

So I dug deeper and did some pretty involved inspection of the dataflow in the JavaScript debugger. I think I found the issue that is causing the overly conservative slice currently, and submitted a fix to the python-program-analysis directory https://github.com/microsoft/python-program-analysis/pull/2.

The question is why this issue didn't cause any problems before, and I'm honestly not sure. Though I'm hopeful this will at least fix the issue for others who share whatever environmental aberration I have from your environment set up.