Closed jvstein closed 8 years ago
You've done your research. These flags are barely documented, all I could find is PEP 432 and some wiki. That's unfortunate that you had to use JNA just to get these set.
PEP 432 says it's scheduled for Python 3.6. It looks like that adds a PyMainInterpreterConfig
that holds most of these variables. Waiting for that would not be very compatible with older versions of Python though.
I'm not quite sure how to implement this in Jep, will need to think about it a bit. In Jep 3.5 I added a JepConfig so we could avoid making too many constructors and breaking backwards compatibility. But I was thinking we could add these flags to JepConfig or add a PyConfig that supported these flags, then pass that along to Jep initialization so you didn't have to use JNA. But these flags are all set prior to the main interpreter, not the sub-interpreters, so that won't work.
Any idea if there's any adverse effects to setting these opposite of the defaults? And if we come up with a solution, I would like to preemptively add support for more of these flags than the three you listed. If you have suggestions about which should be supported, that would be appreciated.
I will need to support Python 2.7 for a while, so a 3.6-only solution isn't that appealing to me.
From the looks of it, the main python interpreter is initialized when the jep module is loaded via JNI and the native module is loaded from a static initializer in the Jep class (i.e. when the first Jep instance is created).
My initial thoughts are to:
Jep.setInitParams(new PyConfig().setNoSite(1))
) and pass the parameters to the new JNI function in the jep module. It's probably best to initialize everything to -1 in the config class and only set the values that were changed by the user on the native side (just in case any Python defaults were changed between versions or are changed in the future).It would change the behavior of the jep native module, which should only be a problem if anyone is using it directly, without the corresponding Jep class. End users who need to change their environment would just need to ensure they call the new Jep.setInitParams
static method before creating their first Jep instance.
As far as variables go, here's a more comprehensive list for the versions supported by jep:
name | versions supported |
---|---|
Py_DebugFlag | 2.6 - 3.5 |
Py_VerboseFlag | 2.6 - 3.5 |
Py_QuietFlag | 3.2 - 3.5 |
Py_InteractiveFlag | 2.6 - 3.5 |
Py_InspectFlag | 2.6 - 3.5 |
Py_OptimizeFlag | 2.6 - 3.5 |
Py_NoSiteFlag | 2.6 - 3.5 |
Py_BytesWarningFlag | 2.6 - 3.5 |
Py_UseClassExceptionsFlag | 2.6 - 3.5 |
Py_FrozenFlag | 2.6 - 3.5 |
Py_IgnoreEnvironmentFlag | 2.6 - 3.5 |
Py_DontWriteBytecodeFlag | 2.6 - 3.5 |
Py_NoUserSiteDirectory | 2.6 - 3.5 |
Py_UnbufferedStdioFlag | 3.2 - 3.5 |
Py_HashRandomizationFlag | 2.6 - 3.5 |
Py_IsolatedFlag | 3.4 - 3.5 |
Py_UnicodeFlag | 2.6 - 2.7 |
I can't think of any adverse effects off hand. I read lots of threads about setting these values in other embedded python environments. Apparently Python 2 allowed the values to be set after initialization, but that changed with Python 3.
Yeah I don't want to do a 3.6-only solution, that complicates the code and everyone is using earlier versions. It was just something I noticed.
Thanks for all the insight. I was thinking along the same lines, we've have to get rid of the static initializer. I was rather fond of that though, except the NoClassDefFound errors it could result in if misconfigured, but it did ensure someone could not attempt to use a Jep instance if they had setup the library or the path to it incorrectly.
The only alternative I've thought of is to fall back on Java system properties, such as -Dpy.no.site.flag=true
. That static initializer could pick those up with Boolean.getBoolean("py.no.site.flag")
or alternatively do the same with Integers. That would ensure the defaults in all cases except where someone explicitly wants to override them like in your case. It's not clear to me at this time how much of this is an edge case vs something a lot of developers would like to make use of. At least you've got the JNA workaround for the time being.
@bsteffensmeier, do you have any thoughts?
I took a first pass at an implementation. Still need to add unit tests to cover the new code, but the existing tests are passing.
https://github.com/jvstein/jep/commit/662983d08c08fb40d2139cba91aa18d3a91803d4
I like the idea of being able to set it up through system properties. To me that helps emphasize that it is a system wide preference. In our project we have multiple plugins using Jep that aren't aware of eachother so if we opted to set one of the flags it wouldn't be clear where in code we should be setting this up so I would prefer to set it globally with a system property. I know there are other users in an OSGi environment and I would imagine they have similar concerns, if they wanted to mess with any of the flags.
I had a few minor comments on the patch but overall I am in favor of merging the functionality into dev_3.6. I think having the option to set the flags from code is good and if anyone ever has a real use case for setting it with system properties we can always add that in later. I think separating the JNI initialization and library loading from the actual python startup might make special cases a little better, for example it already moved the linking onto the same thread as class loading which will give better stack traces without as much code.
Edit: I Just wanted to clarify my contradicting statements. Although I would prefer a system property, I don't object to setting it up in code(as in the current patch). Ideally I would like it to be possible to set it either way but since we don't know if anyone else will have any interest in using this I don't think we need to handle system properties yet.
Reviewing the patchset(s), it looks good, just needs some cleanup. This would need to go on dev_3.6 branch, 3.5 is closed except for critical bug fixes.
I'm attached to the original static initialization though. The applications @bsteffensmeier and I develop that use Jep extensively could potentially use some of these settings for debugging or security reasons, and there's not a single location in the code to ensure that the python init params are set before the library is initialized. One of the applications is an OSGi application and the plugin on-demand activation could cause a different plugin to be the first to initialize Jep. The synchronization and locking would prevent setInitParams(PyConfig)
from being called twice, but we'd still have to update multiple places in code to make sure we were calling it before any of the Jep instances were created.
To solve that, we could also add to PyConfig to set the defaults with Integer.getInteger(name, default)
and only pick up a different setting if set. By default we'd match all Python defaults. We could then get around the issue of updating multiple places in our application code by having Jep always call setInitParams(PyConfig)
with the default PyConfig if none was provided. I really like the idea of PyConfig because it would make the code much clearer, we could load it with javadoc and mention what each setting does and how it corresponds to a Python command line flag (where applicable).
Side note: @jvstein, would you want me or @bsteffensmeier to commit all this completed, or submit a pull request? It wasn't clear to me from sample code if we were going for completion of said code or just proof of concept.
@ndjensen I like that approach. I'll clean up what I have, squash the current commits, and submit a pull request to the dev_3.6 branch.
My primary focus is a set of Spark applications. I have my users specify a virtual environment path and I'm attempting to load the correct python native library (the python version is tied to the virtual environment) and jep module (which exists in the virtual environment). The goal is to support multiple virtual environments for different Spark applications. I have control over the java.library.path
at startup of these processes, but I'm trying to make things as seamless as possible for my downstream developers so they just have to specify the virtual environment and the native libraries are loaded without extra input.
Thanks for the great feedback! Pull Request #50 is open.
Merged pull request #50. Would like to keep this ticket open until I've completed analysis of other flags we want to support and added support for those.
Analyzing what's available and what Jep should support. I bolded what I think we should support, and italicized ones I'm not sure on.
flag | versions supported | cmd line arg | jep support? |
---|---|---|---|
Py_DebugFlag | 2.6 - 3.5 | -d | No, for Python compiled in debug mode |
Py_VerboseFlag | 2.6 - 3.5 | -v | Yes, prints import traces |
Py_QuietFlag | 3.2 - 3.5 | -q | No, quiets version printing at startup |
Py_InteractiveFlag | 2.6 - 3.5 | ? | No, Jep has interactive support |
Py_InspectFlag | 2.6 - 3.5 | -i | No, inspects after running script |
Py_OptimizeFlag | 2.6 - 3.5 | -O | Yes, ignores assert statements, ignores if __debug__ blocks |
Py_NoSiteFlag | 2.6 - 3.5 | -S | Yes, added by @jvstein |
Py_BytesWarningFlag | 2.6 - 3.5 | -b | No, just issues warnings about str(bytes) |
Py_UseClassExceptionsFlag | 2.6 - 3.5 | ? | No, deprecated by Python |
_PyFrozenFlag | 2.6 - 3.5 | ? | ??? Appears to silence error messages about Python unable to load libraries, see getpath.c |
Py_IgnoreEnvironmentFlag | 2.6 - 3.5 | -E | Yes, added by @jvstein |
Py_DontWriteBytecodeFlag | 2.6 - 3.5 | -B | Yes, prevents creation of .pyc, .pyo |
Py_NoUserSiteDirectory | 2.6 - 3.5 | -s | Yes, added by @jvstein |
_PyUnbufferedStdioFlag | 2.6 - 3.5 | -u | ??? uses unbuffered stdout and stderr |
_PyHashRandomizationFlag | 2.6 - 3.5 | -R | ???, random seed for hashes of str, bytes, and datetimes |
Py_IsolatedFlag | 3.4 - 3.5 | -I | No, limited versions, implies -E and -s |
Py_UnicodeFlag | 2.6 - 2.7 | ? | No, limited versions |
Based on my analysis, we support 3/6 after @jvstein's commit. I have three unsures, so maybe 3/9. Looking for feedback from @bsteffensmeier and @jvstein on my assessment.
I agree with the list and the analysis. I think I would add support for Py_HashRandomizationFlag because it's otherwise hard to support. Py_UnicodeFlag looked interesting, but I encountered syntax errors interpreting docstrings in several 2.7 standard library modules. It's probably best to stick with from __future__ import unicode_literals
for that.
name | comments |
---|---|
Py_FrozenFlag | This is used for a method of bundling python code into a single-file windows executable along with the interpreter. The only use case I can think of would be to load and integrate with one of these executables and to suppress the related path warnings. |
Py_UnbufferedStdioFlag | There's a runtime workaround for this -- replace the stdout object with one that calls flush() with every call. It's not as efficient or elegant, but it's functional. |
Py_HashRandomizationFlag | This could be useful when needing to test or stabilize code that uses the hash function. I've had to do this in the past in order to run repeatable tests on select machine learning algorithms. The seed is random by default and the only other way to set this is using an environment variable PYTHONHASHSEED. |
Initial unit test here: https://github.com/mrj0/jep/commit/5524f0aea184d199f9d6aa1452820b3d96bea3a0
Completed in dev_3.6 branch.
I'm packaging up python virtual environments and distributing them with an application mostly written in Scala. It would be nice to be able to switch some of the python pre-initialization variables that are available from the
python
executable flags (e.g.-S
,-s
,-I
). In the python code base, these are defined inPython/pylifecycle.c
and set inModules/main.c
.I'd like to lock down the python interpreter's ability to load some specific code (
site.py
complains about missing C extensions on many Debian-based systems) and control the loading environment as much as possible.These are the flags that are most interesting:
site
module.~/.local/python*/site-packages
.They need to be set before
Py_Initialize
, e.g. inpyembed_startup
.I'm currently using JNA to set the values after loading the python native module and before creating Jep instances. That works well, but I don't really need the JNA dependency for anything else, so it would be nice to ditch it and not have to write another native module.