ronaldoussoren / modulegraph

modulegraph determines a dependency graph between Python modules primarily by bytecode analysis for import statements. modulegraph uses similar methods to modulefinder from the standard library, but uses a more flexible internal representation, has more extensive knowledge of special cases, and is extensible.
MIT No Attribution
38 stars 4 forks source link

`__main__` module being analyzed for wheel-installed scripts #28

Open ronaldoussoren opened 9 years ago

ronaldoussoren commented 9 years ago

Original report by codewarrior (Bitbucket: codewarrior, GitHub: codewarrior).


While doing some work on PyInstaller, I noticed that when running Py.test, the __main__ module was being included in the modulegraph as a SourceModule (and causing a FileNotFound error), so I took some time to figure out why:

First, it's not uncommon to have an import chain leading back to __main__. I found one that reads ['warnings', 'linecache', 'tokenize', 'collections', 'heapq', 'doctest', 'pdb', '__main__'], so I get the feeling that anything that brings in doctest will also point to __main__.

In Python 2.7, __main__ is included as a BuiltinModule. This is because the module name is listed in sys.builtin_module_names. In Python 3.3, __main__ is not listed in builtin_module_names, so the module is marked as a MissingModule.

Where it gets weird is when the main script is a script installed from a .whl file, using pip, on Windows. Under these conditions, the main script (py.test in my case) is installed to scripts\py.test.exe, with the actual python script embedded into the exe as something like a pyz archive. This is different from when the main script is installed from a setup.py using Setuptools, which will create both py.test.exe and py.test-script.py side-by-side in the scripts folder, with nothing embedded.

When running the script installed from a .whl, the __main__ module has a __file__ attribute that says scripts\py.test.exe\__main__.py, but the file doesn't actually exist as it is embedded into the exe. That doesn't stop ModuleGraph from analyzing its imports, probably because it reads the code object and analyzes that instead. The FileNotFound error comes later, when PyInstaller traverses the graph and tries to copy all of the files into an archive.

(I'm sure this would also cause PyInstaller to also copy all of pytest.main's dependencies, if it didn't choke on the FileNotFound)

So, let's get to the points of discussion:

A good reason for always excluding __main__ is that it isn't an actual module - it's more of a placeholder for "whatever script was run from the command-line". This hasn't previously been a problem, until I discovered this exact confluence of conditions that causes __main__ to be analyzed:

I'll hold off on submitting any pull requests here as I'm still not sure what the actual fix should be.

ronaldoussoren commented 7 years ago

Original comment by Ronald Oussoren (Bitbucket: ronaldoussoren, GitHub: ronaldoussoren).


I don't think removing main from the graph is valid, it present after all.

I guess it would be better to teach modulegraph to treat main as something special, either by treating it as a BuiltInModule on python 3, or by adding some other magic (for example keeping track of what is the current script ("run_script") and automatically making references to main references to that script instead.

Something problematic w.r.t. this issue is that I don't run windows myself.

ronaldoussoren commented 5 years ago

Original comment by Ronald Oussoren (Bitbucket: ronaldoussoren, GitHub: ronaldoussoren).


modulegraph2 adds __main__ to the list of excludes. I propose to do the same here.