Open mattip opened 3 years ago
If I have a module "parent", and I add another module "child" with a method "f" to it:
child = PyModule_Create(...);
PyModule_AddObject(parent, "child", child);
then I can call
import parent
parent.child.f()
but importing like this
from parent.child import f
raises a ModuleNotFoundError: ... 'parent' is not a package
This came up in PyTorch https://github.com/pytorch/pytorch/issues/38137 and in pybind11 https://github.com/pybind/pybind11/issues/2639, and in various other places like stackoverflow https://stackoverflow.com/questions/38454852/importerror-with-error-is-not-a-package
A complete example is attached
If this is intentional, it might be nice to emit a warning when calling PyModule_AddObject with a module.
>> import parent.child
first imports "parent" (successfully) but then fails, because the import code has no knowledge of were to find ".child". This is because a) the module "parent" is not marked as a package (hence the error message) b) even if it were a package, there is no (ext) module file to locate and load.
If you instead run
> from parent import child
only "parent" is imported, and "child" is retrieved as an attribute - which it actually is. The import code itself will add such an attribute, too [1]. However, that is after the submodule was located and loaded. Attribute lookup on the parent is not part of the submodule import itself.
A (hacky) work-around would be to add an entry for "parent.child" in sys.modules. This prevents the import machinery from running.
A (more) proper solution would be to mark "parent" as a package and install some importlib hooks. See [2] for an example from Cython-land.
Finally there is PyImport_AppendInittab() [3], which could possibly be used to register "parent.child". I have never tried that. Even if this worked, it would be unsupported and probably not without side-effects.
[1] https://docs.python.org/3/reference/import.html#submodules [2] https://stackoverflow.com/questions/30157363/collapse-multiple-submodules-to-one-cython-extension [3] https://docs.python.org/3/c-api/import.html?highlight=inittab#c.PyImport_AppendInittab
Bump
Can somebody please explain the proper usage of the C API such that we can create actual packages and submodules from C code that are lazy loaded (Py_AppendInittab)?
Edit:
After some spelunking in the cpython code to find where the "is not a package" error was being thrown, it appears it was not happy with the fact that the parent package did not have the path dunder attribute set. As a test on the parent, I put a module exec slot on the parent and set the path attribute in the exec function to None. Then I created another module definition for a few sub modules of the parent package and appended them using the PyImport_AppendInittab function and dot notation.
PyImport_AppendInittab(parent) PyImport_AppendInittab(parent.sub) PyImport_AppendInittab(parent.sub.sub)
The imports now work in a test python script -> import parent.sub.sub. No python, pure C code embedding python. Neat. Now I can lazy load with sub module support. Not sure if this is intended.. but unless somebody comes in here and says otherwise, I will run with this.
I guess the recommendation here would be to set an exec slot on every module and set the path dunder to None. Additionally, set other attributes like @mattip's OP to make sure the python accesses work as expected. This can all be done lazily as a python script requests the modules instead of brute force creating all the modules up front. After all, lazy loading (PyImport_AppendInittab) was my goal here.
@mattip if I understand the original source of this correctly:
The issue is really about pickle
not being able to serialize attributes of these C-module
s, not so much that you can't import these C-module
s directly. I am experiencing exactly the same situation, again coming from an issue serializing attributes of PyTorch C-module
s.
Consider a roughly analogous situation:
import pickle
import sys
assert pickle.sys.settrace is sys.settrace
import pickle.sys # ModuleNotFoundError: No module named 'pickle.sys'; 'pickle' is not a package
I think it is perfectly reasonable for import pickle.sys
to raise an error (even if the error message is wrong).
I think a pure-Python example of what we are seeing with the C-module
s is this:
my_module.py
:
import pickle
import types
eval_frame = types.ModuleType("eval_frame")
def reset_code():
...
reset_code.__module__ = "__main__.eval_frame"
eval_frame.reset_code = reset_code
with open("path.pt", "wb") as f:
pickle._Pickler(f).dump(pickle.sys.settrace) # OK
with open("reset_code.pt", "wb") as f:
pickle._Pickler(f).dump(eval_frame.reset_code) # _pickle.PicklingError: Can't pickle <function reset_code at 0x7f2383e651f0>: import of module '__main__.eval_frame' failed
Like pickle.sys
, eval_frame
is a module
but not a package, and so is not importable with import my_module.eval_frame
. The main difference here is that pickle
doesn't try to import pickle.sys
when serializing pickle.sys.settrace
because pickle.sys.settrace.__module__ == "sys"
, and pickle
has no trouble importing sys
from the global scope.
For eval_frame
(and our C-module
s), this is not the case. It is a module
with no corresponding globally importable package. my_module.eval_frame.reset_code.__module__ == "my_module.eval_frame"
, and you can only access eval_frame
thru my_module
. This would equate to from my_module import eval_frame
.
By tweaking pickle._Pickler.save_global, and pickle._Unpickler.find_class we can achieve this from ... import ...
behavior, maintaining backward compatibility while also supporting the above example and serializing C-module
attributes:
class _Pickler:
...
def save_global(self, obj, name=None):
...
module_name = whichmodule(obj, name)
try:
_module_name, *fromlist = module_name.rsplit(".", 1)
module = __import__(_module_name, fromlist=fromlist, level=0)
if fromlist:
module = getattr(module, *fromlist)
obj2, parent = _getattribute(module, name)
except (ImportError, KeyError, AttributeError):
...
...
class _Unpickler:
...
def find_class(self, module, name):
...
module_name, *fromlist = module.rsplit(".", 1)
module = __import__(module_name, fromlist=fromlist, level=0)
if fromlist:
module = getattr(module, fromlist[0])
if self.proto >= 4:
return _getattribute(module, name)[0]
else:
return getattr(module, name)
I am eager to solve this problem, and so I welcome feedback on my understanding and proposed solution. Please let me know.
Try to set sys.modules['parent.child'] = child
after creating the child
module. Would it help?
BTW, it is not recommended to use PyModule_AddObject()
, which is broken by design. Use PyModule_AddObjectRef()
or PyModule_Add()
, depending whether you want to keep a reference to the added object.
I guess the PR was closed due to conflicts with recent changes in the pickle
module (there was some refactoring), and due to a lack of response from my side.
Thoughts about this idea:
package.c_module
, but not for package.c_module1.c_module2
. We could try this recursively, but this may complicate the code.save_global()
we convert a pair package.c_module
and qualname
to the pair package
and c_module.qualname
, we may save some time for unpickling (perhaps).On other hand, if setting sys.modules
works, we can leave this. There may be the same issue in other serializers -- yaml, xmlrpc.
I have no strong preference. We may try different approaches and see what will happen.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = []
title = 'submodule of c-extension module is quirky'
updated_at =
user = 'https://github.com/mattip'
```
bugs.python.org fields:
```python
activity =
actor = 'skoslowski'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation =
creator = 'mattip'
dependencies = []
files = ['49844']
hgrepos = []
issue_num = 43367
keywords = []
message_count = 2.0
messages = ['387917', '387935']
nosy_count = 3.0
nosy_names = ['mattip', 'skoslowski', 'YannickJadoul']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue43367'
versions = []
```
Linked PRs