xonsh / amalgamate

Collapses Python packages into a single module.
BSD 3-Clause "New" or "Revised" License
28 stars 7 forks source link

Fails with relative imports #1

Closed yaroslavvb closed 8 years ago

yaroslavvb commented 8 years ago

Thanks for putting this together!

I was hoping to use this to compile my package into a single Python file to simplify installation, but it doesn't seem to work with relative imports

IE, for import like below it fails with 'NoneType' object has no attribute 'startswith'

from . import itensor as itensor_lib

It seems corresponding _ast.ImportFrom object for this statement has None for module, hence error

Traceback (most recent call last):
  File "/Users/yaroslavvb/Downloads/amalgamate-0.1.0/amalgamate.py", line 379, in <module>
    main()
  File "/Users/yaroslavvb/Downloads/amalgamate-0.1.0/amalgamate.py", line 369, in main
    graph = make_graph(pkg, exclude=exclude)
  File "/Users/yaroslavvb/Downloads/amalgamate-0.1.0/amalgamate.py", line 94, in make_graph
    graph[base] = make_node(base, pkg, allowed)
  File "/Users/yaroslavvb/Downloads/amalgamate-0.1.0/amalgamate.py", line 71, in make_node
    elif a.module.startswith(pkgdot):
AttributeError: 'NoneType' object has no attribute 'startswith'
> /Users/yaroslavvb/Downloads/amalgamate-0.1.0/amalgamate.py(71)make_node()
-> elif a.module.startswith(pkgdot):

Here's a repro

# extract "immediate" package into "temp"
wget -O georgia.zip https://github.com/yaroslavvb/tensorflow/archive/georgia.zip
unzip georgia.zip
cp -R tensorflow-georgia/tensorflow/contrib/immediate/python/immediate ~/temp
cd ~/temp

# try to amalgamate it
python3 amalgamate.py immediate
scopatz commented 8 years ago

Thanks for reporting @yaroslavvb! I would imagine that this would be a relatively simple fix by modifying the make_node() function to allow insert the package name (pkg) whenever it sees a name that starts with a dot.

A PR with this would be super helpful!

yaroslavvb commented 8 years ago

I hacked around it get it to produce a file in https://github.com/xonsh/amalgamate/pull/3

The final module doesn't work however, because it still retains imports from original package. My __amalgamate__.py has lines like this

from . import itensor as itensor_lib
...
    return itensor_lib.ITensor(self, handle)
yaroslavvb commented 8 years ago

OK, I was able to make some progress, but found more bugs.

Some from __future__ import absolute_import lines didn't get commented out in __amalgamate__.py (but some were commented out). Some relative imports were uncommented out.

IE I see this

from .util import IsListParameter
# amalgamated from util import get_current_device_string

Finally, commenting those by hand, and fixing all imports in the amalgamate file, I get this

  File "immediate.py", line 1212, in apply_op
    self.env.numpy_to_itensor)
  File "immediate.py", line 1493, in convert_to_itensors_with_type_inference
    keywords[name] = numpy_to_itensor(itensor, inferred_dtype)
  File "immediate.py", line 324, in numpy_to_itensor
    if not isinstance(array, np.ndarray):
  File "immediate.py", line 37, in __getattribute__
    dct = self.__dct__
  File "immediate.py", line 36, in __getattribute__
    return super().__getattribute__(name)
TypeError: super() takes at least 1 argument (0 given)

I'm guessing this may have something to do with Python2.7 compatibility? I'm kind of stuck on Python 2.7 because TensorFlow is tested at version 2.7 (which is because Google is stuck on Python2.7 for another year/two)

scopatz commented 8 years ago

Hi @yaroslavvb - I think the reason that #3 doesn't work yet is because of some of the issues mentioned in that PR.

Also the from __future__ thing is a deficiency that we don't handle yet. Can you please open another issue for that.

Also, the super / python2.7 issue is a relatively quick fix. It just needs the class name and self passed in for compatibility. Please open an issue or feel free to put in a hot fix for this as well.

I really appreciate it!

yaroslavvb commented 8 years ago

Thanks for the tip, I got a bit further this time, now it fails with this in Python 2.7

  File "immediate.py", line 320, in numpy_to_itensor
    if not isinstance(array, np.ndarray):
  File "immediate.py", line 37, in __getattribute__
    dct = self.__dct__
  File "immediate.py", line 36, in __getattribute__
    return super(self).__getattribute__(name)
TypeError: must be type, not _LazyModule
scopatz commented 8 years ago

Yeah, it actually needs to be super(_LazyModule, self). The first arg must be a type (ie the class)

yaroslavvb commented 8 years ago

Ah thanks, that seems to have fixed it

scopatz commented 8 years ago

Great! Mind putting in a PR to that effect? Thanks!

yaroslavvb commented 8 years ago

I've updated/squashed the fix into previous PR https://github.com/xonsh/amalgamate/pull/3

yaroslavvb commented 8 years ago

The PR above enabled work-around where all relative packages are used as "from pkg import method; method"

It doesn't yet work with "import pkg as mypkg; mypkg.method", the test is here https://gist.github.com/yaroslavvb/428fc523d6dc69cc510f7e8cb2b128e8

scopatz commented 8 years ago

It doesn't yet work with "import pkg as mypkg; mypkg.method", the test is here

OK, maybe that can be a separate PR.

yaroslavvb commented 8 years ago

Do you have a sense of how hard it would be to add support for relative imports? IE, to be able to keep "import pkgfile as something; something.a" in my original files?

scopatz commented 8 years ago

Do you have a sense of how hard it would be to add support for relative imports? IE, to be able to keep "import pkgfile as something; something.a" in my original files?

This might be an impossible case to always amalgamate safely because what this statement does depends on the user's runtime sys.path value.

However, to amalgamate these anyways, you would just have to check if "pkgfile" is in the package directory. I would be in favor of having support for this with an additional command line argument that turns it on.

However, I would highly encourage you to rewrite such imports. PEP8 says never to use them and they aren't valid Python 3. This is because they are super error prone in a way that you don't notice until code is on another machine.

Barring that, I am happy to have a PR that implements the package check.

yaroslavvb commented 8 years ago

Sorry, bad example, I don't care about putting things on same line, just about being able to use relative packages, ie

import .submodule as mymodule
mymodule.runfunction()

Right now, such lines stay unmodified, and I'm basically using Amalgamate as a tool to glue all the files together, and remove "import ." lines as a post-processing step

scopatz commented 8 years ago

Yeah, so their is a difference between implicit relative imports (your previous example) and explicit relative imports (this example, with a leading dot). The implict relative import is what is unsafe to do.

The explicit relative import wouldn't be hard to add either. The code just needs to check if the package / module name starts with a . and then prepend the package name. Also, it should check if it starts with a .. and add the higher level package name.

scopatz commented 8 years ago

I think it would be not a lot of code to do this.

yaroslavvb commented 8 years ago

What about the making it fully work for "from .package import a" kind of lines? Is that a one-line change somewhere? The PR #3 makes it work for gluing together lines, but it still leaves the "from .package import a" lines in the final amalgamated file which need to be removed as a post-processing stage

scopatz commented 8 years ago

Yeah, that is probably a few line change in the rewrite_imports() function https://github.com/xonsh/amalgamate/blob/master/amalgamate.py#L214

Right now it is just skipping those imports because it doesn't recognize them / doesn't translate the .module or .package into pkg.module or pkg.pacakge, and therefore thinks it is a new import, rather than one it has potentially already seen.

scopatz commented 8 years ago

Let me know if this does / doesn't make sense.

scopatz commented 8 years ago

My goal is to hopefully make it so that you don't need to have a post-process, but I might not be able to hack on this for a couple of days :)

scopatz commented 8 years ago

Hey, sorry it has taken me so long to get to this. Just released v0.1.2 which includes warnings if names are redefined across multiple modules.

scopatz commented 8 years ago

Alright, I now have what I believe are working relative imports on the relimps branch. @yaroslavvb - would you be willing to test this out, please?

yaroslavvb commented 8 years ago

Testing it now

scopatz commented 8 years ago

any word?

scopatz commented 8 years ago

Also, just to note this has now been merged into upstream xonsh

yaroslavvb commented 8 years ago

Sorry for delay, I had to do a bit of extra work to adapt tests to new TensorFlow release last week. It seems to work now without needing to delete __future__ and from . import lines.

One remaining issue is that I had from env import * in the __init__.py. It seems to overwrite it, but I can work-around it by appending that line to newly generated __init__.py.

I'm wondering if it's possible to do relative package imports within a single file distribution. It seems to require two files right now, the __init__.py and __amalgam__.py . My workaround is to only use __amalgam__.py and structure my code to not require sys.modules modifications from __init__ to work

yaroslavvb commented 8 years ago

Thanks for the change though, it simplifes my "amalgamate_postprocess" script considerably

scopatz commented 8 years ago

Great, thanks @yaroslavvb! I have gone ahead and merged #5 in and done a v0.1.3 release.

One remaining issue is that I had from env import * in the init.py. It seems to overwrite it, but I can work-around it by appending that line to newly generated init.py

hmm strange. Maybe you could just put the from env import * at the end of the file? Would you mind opening an issue for this?

I'm wondering if it's possible to do relative package imports within a single file distribution. It seems to require two files right now, the init.py and amalgam.py . My workaround is to only use amalgam.py and structure my code to not require sys.modules modifications from init to work

That should be possible, and would be a nice option to have. Rather than rewriting the imports, we could just fill that block with what would be the contents of the __amalgam__.py. If you could open an issue for that too, that would be great! A PR would be even better :) I think this is pretty easy since all the pieces are there.

Thanks for the change though, it simplifes my "amalgamate_postprocess" script considerably

Great! It would be ideal for you not to have to have such a script.

yaroslavvb commented 8 years ago

Regarding single file distro, how would the sys.modules lines in __init__ get updated? IE, right now it does something like _sys.modules['imperative.env'] = __amalgam__, if __amalgam__ is merged into __init__, can sys.modules be updated to point at the module that's currently being loaded?

yaroslavvb commented 8 years ago

OK, so the fix makes it work with from .subpackage import method, but it doesn't work for from . import subpackage

In particular, I have structure like this

imperative/
   env.py
   util.py

Case 1: the following works in env.py

from .util import somemethod

Case 2: the following doesn't work

from . import util
util.somemethod()

The error for the latter case is this:

Traceback (most recent call last):
  File "release/amalgamate.py", line 554, in <module>
    main()
  File "release/amalgamate.py", line 547, in main
    src = amalgamate(order, graph, pkg)
  File "release/amalgamate.py", line 462, in amalgamate
    lines = rewrite_imports(name, pkg, order, imps)
  File "release/amalgamate.py", line 411, in rewrite_imports
    raise RuntimeError(msg)
RuntimeError: Cannot amalgamate import of amalgamated module:

  from imperative import util

I can dig into it a bit more if you think Case 2 can be made to work in Python. Personally I'm not clear that sys.modules can be massaged for this pattern to work with single-file distribution.

yaroslavvb commented 8 years ago

OK, I think at this point this bug can marked as fixed since I haven't found any more problems with relative imports. The issues I had have more to do with my specific need of using 1 file instead of 2 files, I'll file that separately for tracking