pyinstaller / pyinstaller-hooks-contrib

Community maintained hooks for PyInstaller.
Other
93 stars 124 forks source link

yapf FileNotFoundError #783

Closed swap-10 closed 5 days ago

swap-10 commented 3 weeks ago

Description of the issue

yapf is used in the groundingDINO project. Using pyinstaller to build a onedir executable. yapf is not included in the onedir/_internal packages, and so module not found error is encountered on running the frozen application.

Context information (for bug reports)

I have already tried adding yapf to hiddenimports.

On manually copy-pasting the yapf module from my python installation into the onedir/_internal directory, the application works as expected. So this seems a problem of yapf not being recognised and exported properly.

I'm using a .spec file to construct the executable. Exe runs fine until the feature/piece of code that uses the yapf module is encountered.

A minimal example program which shows the error

main.py:
import yapf

pyinstaller main.py --onedir

FileNotFoundError: No such file or directory: path/to/dist/main/_internal/yapf_third_party/_ylib2to3/Grammar.txt

Reported earlier in pyinstaller/pyinstaller#4939 in 2020 but closed as stale

rokm commented 3 weeks ago

FileNotFoundError: No such file or directory: path/to/dist/main/_internal/yapf_third_party/_ylib2to3/Grammar.txt

Looks like yapf requires data file from yapf_third_party package. Try adding --collect-data yapf_third_party to your PyInstaller command.

bwoodsend commented 3 weeks ago

Reported earlier in https://github.com/pyinstaller/pyinstaller/issues/4939 in 2020 but closed as stale

Note that pyinstaller/pyinstaller#4939 has nothing to do with this issue. You have a FileNotFoundError, that issue is just messing up their Python environments.

swap-10 commented 3 weeks ago

Looks like yapf requires data file from yapf_third_party package. Try adding --collect-data yapf_third_party to your PyInstaller command.

Thanks! That did the trick!

Out of curiosity, when I inspect the _internal directory of the dist, it still doesn't have the yapf module dir where it has others like onnx, torch etc. Though the code that imports the yapf module works without issue. How does this happen? Is it not necessary that modules included in the build show up as directories in the _internal dir?

This will help me have a better understanding of pyinstaller to diagnose future issues, as I've encountered similar issues before.

Thanks a bunch!

swap-10 commented 3 weeks ago

Note that pyinstaller/pyinstaller#4939 has nothing to do with this issue. You have a FileNotFoundError, that issue is just messing up their Python environments.

I'm sorry, you're right. When the FileNotFoundError came up I inspected the _internal dir of the dist and didn't find the yapf module dir there, whereas others like onnx, torch were present; and assumed this meant that the yapf module wasn't included properly with the build.

I guess my understanding of this is incorrect.

If you have more details about this process, could you please share that?

Thanks!

bwoodsend commented 3 weeks ago

Python files don't appear on the file system. They're bytecompiled and put in a zip-like archive inside the exe file.

swap-10 commented 3 weeks ago

Oh the base-library.zip file? Or inside the exe in the onedir mode too?

And all the module-named directories like onnxruntime, torch that appear in the _internal directory are only there to keep the data files associated with those modules?

rokm commented 3 weeks ago

Oh the base-library.zip file? Or inside the exe in the onedir mode too?

No, base-library.zip contains the stdlib modules that we need to initialize python interpreter and for bootstrap. The rest of pure-python modules are collected into PYZ archive, which is embedded in the executable (both in onefile and onedir mode). You can use pyi-archive_viewer <path-to-executable> to inspect the embedded PKG archive, and the PYZ archive that is embedded within it.

And all the module-named directories like onnxruntime, torch that appear in the _internal directory are only there to keep the data files associated with those modules?

Yes. Data files and binaries - bundled shared libraries and binary extension modules. Although for some packages, like torch, we also need to collect source .py files, so you will see those in the torch directory as well...

swap-10 commented 1 week ago

Sorry for returning late to this.

Very helpful details, thanks a lot! I'll check out the archive viewer as well, sounds quite useful.

Although for some packages, like torch, we also need to collect source .py files, so you will see those in the torch directory as well...

Is this because of aten and how the python API wraps over it, or because some configs are stored in .py files? If my package is having an issue being bundled by pyinstaller, what could be a signal that I need to do something like this (like you do for torch)?

bwoodsend commented 1 week ago

Torch takes plain source code and feeds it to its special JIT compiler. Occasionally, other libraries try to use source code for things they shouldn't. They will almost always be using inspect.getsource() which raises a fairly self explanatory OSError: could not get source code error.