twosigma / beakerx

Beaker Extensions for Jupyter Notebook
http://BeakerX.com
Apache License 2.0
2.8k stars 382 forks source link

Notebook must be restarted each time dependencies are changed #7741

Open benmccann opened 6 years ago

benmccann commented 6 years ago

I have a cell in my notebook which declares my dependencies:

%%classpath add mvn
tech.tablesaw tablesaw-core 0.24.7
tech.tablesaw tablesaw-beakerx 0.24.7
com.jimmoores quandl-tablesaw 2.0.0

Then I updated those dependencies:

%%classpath add mvn
tech.tablesaw tablesaw-core 0.24.9
tech.tablesaw tablesaw-beakerx 0.24.9
com.jimmoores quandl-tablesaw 2.0.0

BeakerX is still picking up 0.24.7 and I have to kill the notebook and restart it for the new version to be picked up. I'm guessing it's adding both 0.24.7 and 0.24.9 to the classpath, but 0.24.7 is the one that's being used (possibly because of classpath ordering)

My preferred solution to this problem would be to allow only one cell in the notebook where you can specify dependencies. When this cell is run, it clears all previous dependencies. This would also fix another issue I've run into where I declare some dependencies in one cell and some in another. Those two sets of dependencies may have some transitive dependencies that are the same, but version conflicts occur because they were not resolved together. I've learned over time that I must specify all dependencies in a single cell or I frequently get unexpected behavior.

jpallas commented 6 years ago

I'm not sure about other kernels, but for Scala, at least, the interpreter is reset (destroying state) when the classpath is modified, which makes incremental changes to the classpath pretty pointless. But it may make sense to have more than one cell with classpath updates for explanatory purposes (especially when notebooks are being used as an educational tool). It may even make sense to have a sequence like "execute this (classpath) cell if you are using X, execute this other other cell if you are using Y". So, I'm not sure I'd be on board with only allowing one classpath cell.

If someone adds something they didn't want, they may have no choice but to restart the kernel and run only the classpath mods that they really want.

That doesn't address the issue @benmccann raises about resolution of version conflicts. One way to have both incremental maven cells and global resolution would be to treat each incremental update as if it were the merge of all the incremental updates seen so far and update the classpath accordingly. There may be some overlap with #7472 around using the classpath that maven computes rather than the non-deterministically ordered collection of all the dependencies.

benmccann commented 6 years ago

I'm using the Groovy kernel. It seems like it's incrementally updating the classpath. I would really like if it's behavior were consistent with the Scala kernel functioning that you just described.

I'm not too worried about version conflicts if we get rid of incremental updates. Incremental updates are really confusing, so I would be happier not supporting that use case.

altavir commented 6 years ago

Groovy has a very good internal support for dynamic classpath, I do not think it makes any sense to replicate its functionality in other languages. But neither it is good to remove the functionality. Probably it should just be properly documented.

benmccann commented 6 years ago

If we support dynamic classpath, I think we may want to change more than just the documentation. I find myself frequently having to kill the kernel and restart it when the classpath changes. E.g. maybe we need to reverse the classpath ordering so that new libraries are put on the beginning / end? Right now the new libraries just never get picked up and the old ones always take precedence

altavir commented 6 years ago

Dynamic class path it not so easy. Java does not support removing anything from classpath, so in order to do so, you have to copy all the data from one classloader to another each time you change classpath. Groovy does that since bindings which actually store data are defined outside classloader. But I am not sure it is possible in statically compiled language.

jpallas commented 6 years ago

If I understand the above remarks, Groovy has native classpath handling that is more flexible than the classpath handling provided by the shared BeakerX kernel base. But the other languages can't really support that degree of flexibility, so either Groovy becomes a special case or all the languages share behavior that is consistent (but not as flexible as Groovy's).

ctrueden commented 5 years ago

@benmccann I agree that there are subtleties here that can be unintuitive. But in line with other comments above, I don't see any clearly superior behavior being feasible to implement. So maybe solving this issue just comes down to some clearer documentation explaining these limitations in the %classpath magic notebook?

benmccann commented 5 years ago

I think that we could still improve the version conflicts by adding the incremental classpath updates to the beginning of the classpath rather than the end. Right now you can never update a cell that uses %classpath magic because the old contents of that cell take precedence over the current contents.