kylebarron / stata_kernel

A Jupyter kernel for Stata. Works with Windows, macOS, and Linux.
https://kylebarron.dev/stata_kernel/
GNU General Public License v3.0
266 stars 57 forks source link

Use exactly one kernel_cache_dir and config object per kernel instance #406

Closed ryan-wallace-frbb closed 2 years ago

ryan-wallace-frbb commented 3 years ago

whatsnew entry:

mcaceresb commented 3 years ago

@ryan-wallace-frbb It will take me a little bit to get to reviewing this. One conceptual question that I have: It seems to me that the solution, conceptually, is to instead of having the files saved directly into cache_dir, to make a sub-directory that is unique to a given kernel and save everything there.

This sounded relatively simple to me. I thought it would be sufficient was to add something like this:

cache_dir = cache_dir / temp_dir_name

before config.py L82 (and perhaps make sure that the temp dir is not saved if cache_directory is saved interactively). However, your proposed changes look to be a lot more substantial than that. Can you please elaborate as to why the fix is more complex than my understanding of the problem would suggest? Thanks!

ryan-wallace-frbb commented 3 years ago

Thanks, @mcaceresb.

The solution as implemented accords with your conceptual understanding--in config.py:84 we create a per-kernel temporary subdirectory of cache_dir to save all the runtime files for that kernel.

But to keep the reference to this per-kernel cache_dir consistent within the kernel, we instantiate the Config class exactly once per kernel in kernel.py:69. Otherwise, each separate invocation of config = Config() in kernel.py:171 would yield a different randomly named TemporaryDirectory for the cache_dir every time the import from .config import config is run.

Nearly all the other modifications are to change references from config objects to the kernel attribute self.conf.

ryan-wallace-frbb commented 2 years ago

Hi @mcaceresb, just a heads up that I addressed the issues that broke the previous builds. In particular, I made the CodeManager constructor take a Config object instead of a Kernel object to remove the dependency on Stata itself for the tests.

The latest build #825 now passes with Python 3.6 and 3.8. It fails on 3.5 and 3.7, but due to version issues with pip (as does the previous build #821).

ryan-wallace-frbb commented 2 years ago

Hey @mcaceresb, I hope you're enjoying the holiday season.

Just a heads up that I made two slight modifications get this to pass the checks for all the Python versions tested. Please let me know if there's anything else I can do on this.

mcaceresb commented 2 years ago

@ryan-wallace-frbb Thanks! Sorry I haven't had the time to look at this PR. I do hope to get to it soon (my semester technically is just finishing this week). Hope you are enjoying your break as well. Cheers.

mcaceresb commented 2 years ago

@ryan-wallace-frbb Hey; I'm finally having some time to review this PR today/this week. Regarding my previous comment, you had said that you said this:

But to keep the reference to this per-kernel cache_dir consistent within the kernel, we instantiate the Config class exactly once per kernel in kernel.py:69. Otherwise, each separate invocation of config = Config() in kernel.py:171 would yield a different randomly named TemporaryDirectory for the cache_dir every time the import from .config import config is run.

Can you give me an example where from .config import config is run more than once? Your change seems to work (I still need to test it out a little more). However, if the only changes I keep are lines 7 and 84 in config.py, that seems to work too.

ryan-wallace-frbb commented 2 years ago

Thanks @mcaceresb.

Previously, config.py ended with an instantiation of the Config class at the top-level of the module with config = Config(). I forgot that Python's import system ensures that each module is only imported once per interpreter session. So I mistakenly thought that a new instance of the Config class (with a corresponding new randomly named cache_dir) would be created each time from .config import config is run at the top of other modules in the package.

I agree now that it's sufficient to make the smaller set of changes you mentioned in config.py. I've updated this PR accordingly.

I think the only possibly unintuitive change now is creating an instance variable _cache_temp_dir for the TemporaryDirectory object in the Config class. This is done to maintain a reference to the TemporaryDirectory object itself so that it's not cleaned up automatically after the reference drops out of scope at the end of the class's __init__ method. Since we use only the pathlib path to refer to the directory outside of the initialization, I prefixed the variable name with an underscore to indicate it's purpose is private to the class.