Open ejguan opened 3 years ago
I'm happy to take this up.
It seems like if you return the lazily loaded module from some function, you don't actually need to support as ...
. For example, you could do something like np = lazy_import("numpy")
where you'd normally do this check.
One question about the lazy loading is how lazy we want to get. I understand the intent of this issue to effectively be a way to wrap up the try-except import and cache the module (or parent). If so, I guess the logic would basically be:
def lazy_import(module, error_msg, submodule=None):
if not module_installed(module, submodule):
raise ModuleNotFoundError(error_msg)
# check if module (and submodule if provided) are in the "cache"
if not module_imported(module, submodule):
import(module, submodule)
# return the module (or submodule) to be used in code
return load_module(module, submodule)
Is this what you had in mind, @ejguan? If so, there are probably some other logistics to address like what to do with parent modules which might be polluting the namespace, etc. If we're only worrying about modules and submodules, I think we could do this all with importlib
, though submodules can be a bit hacky at times...
@erip
Thanks for taking interest in this issue.
I understand the intent of this issue to effectively be a way to wrap up the try-except import and cache the module (or parent)
This would cover partial requirement.
I think the ideal situation would be import module/submodule to global namespace lazily. And, when the methods from such module or submodule are invoked, the actual module is imported. By this mean, we don't need to assign the module to each DataPipe instance self
to expose such modules into __iter__
function.
Current:
class XXXDataPipe:
def __init__(self, ...):
try:
import abc
self._abc = abc
except:
raise Error
def __iter__(self):
self._abc
Ideally:
class XXXDataPipe:
def __init__(self, ...):
abc = lazy_import("abc") # Inject into global namespace
def __iter__(self):
abc.xxx # I am not sure if mypy is going to allow such behavior
I see. The failure to import would result in an immediate error, but the actual import is the lazy part. Is that right? In other words, attribute access on not-installed modules should never happen?
In other words, attribute access on not-installed modules should never happen?
Yeah, you are right. The Error should be raised at the first place. Otherwise, it should be a bug.
Excellent, thanks for the clarification. One idea that's kind of messy is using globals()
to populate the module with as_
or fully qualified name. The issue is that it will then be a global everywhere and not just within the file. For instance, in this minimal example
def lazy_import(name, as_=None):
# Doesn't handle error_msg well yet
import importlib
mod = importlib.import_module(name)
if as_ is not None:
name = as_
globals()[name] = mod
class Foo:
def __init__(self):
lazy_import("numpy", as_="np")
def foo(self):
return np.array([1,2,])
f = Foo()
print(f.foo()) # prints [1, 2] as expected
Using this, I think it might introduce some collisions or a heavily polluted global namespace.
You are right. Then, let's keep it as the minimum as possible. Not using global for now. If more users requested, we can easily extend it.
So you want to stick with the current, but wrap up the try/except logic?
class XXXDataPipe:
def __init__(self, ...):
self._abc = lazy_import("abc")
def __iter__(self):
self._abc
I asked here and it seems like pandas has something similar to what we want. They still need to import where used so it's not terribly different, but it prevents us from needing to stuff modules into self
or have globals floating around.
I asked here and it seems like pandas has something similar to what we want. They still need to import where used so it's not terribly different, but it prevents us from needing to stuff modules into
self
or have globals floating around.
Sounds great.
🚀 Feature
Currently we used
import
in each DataPipe class to make lazy importing happens like https://github.com/pytorch/data/blob/4802a350d1bf954d0785e0f22fd1fcde2deded76/torchdata/datapipes/iter/load/iopath.py#L21-L29As more potential libraries used in TorchData to support different functionalities, we could add a methods to support lazy import module to global namespace. Then, we don't need to duplicate the
import
inside each class used the same third-party module.Features needed:
from ... import ... as ...
import xxx.yyy