Closed jacobkahn closed 1 year ago
@kpu this should drastically reduce the number of "how do I install KenLM, I'm trying to do X" issues given that a lot of those downstream needs are coming from Python packages requiring it. Hope to merge this shortly -- let me know if you have any questions.
@jacobkahn had you tested non-default max order like --install-option="--max_order 7"
in the python installation? Build is now taking terrribly long for me. Also people is complaining about not being compiled correctly, but I'm not even able to check because it takes a lot of time to build.
May I suggest the use of scikit-build?
@ZJaume thanks for letting me know. This change doesn't affect the default Python library build at all, so all of those build options should stay exactly the same... I'll test those again now and see what's going on.
With respect to build speed, this quite literally performs the identical build twice once using the CMake version in order to produce dylib that C++ binaries can consume, so I'm not sure why build speed would suffer so significantly.
I'll spend some time on fixing this right now given those issues; I'll give scikit build a try as well. I'll tag you in the fix.
@ZJaume — after further investigation, the increase in build time is due to the confluence of multiple things:
pyproject.toml
, as stipulated in PEP 517 and PEP 518, means builds occur in an isolated environment. This means that build dependencies need to be installed, which will increase the build time slightly as wheels for CMake and setuptools need to be downloaded. Isolated builds are the new standard in Python/pip, and other approaches to building will be deprecated. See related documentation.argv
/--install-option
for pip builds is behavior that precludes using caching of build dependencies (https://github.com/pypa/pip/issues/11358), and is also not recommended. I was able to reproduce a significant build slowdown when using those options waiting for build dependencies to be installed.In https://github.com/kpu/kenlm/pull/420, I've changed the behavior to read from a MAX_ORDER
environment variable, which is a simpler alternative to --install-option
-based configuration. I've updated the documentation accordingly, which didn't document the --max-order
install option as it was. If this breaks downstream dependencies, they should swap to using a PEP 517-compliant way of building and installing KenLM via pip. Let me know if you have any other problems here if you decide to switch away from the old commit and onto master.
I explored scikit-build briefly (https://github.com/jacobkahn/kenlm/tree/scikit_build_python), but had some problems. We can revisit this down the line.
cc @kpu
Augments the Python build to also ship C++ library builds of KenLM that are dynamically usable by other C++ libs. Specifically,
pip install .
in the repo (orpip install git+https://github.com/kpu/kenlm
, as it'll likely be most often used) now installs:kenlm.cpython-[platform].{so,pyd}
(as before) -- openable by Pythonlibkenlm.{so,dll,lib}
[new] -- not loadable by Python but usable by C++ libraries.In particular, this PR:
KENLM_MAX_ORDER
CMake definition to the beginning of the buildBuildStandalone.cmake
, enabled byBUILD_PYTHON_STANDALONE
, which buildslibkenlm
as a shared lib containing symbols from all project sources-dynamiclib
on macOS and let CMake handle this. MSVC will build apyd
(special DLL) anyways on Windows, so we need a cross-platform way to build shared libs that are dlopen'ed in C++ land by Python libs.setup.py
and run them with the regular cython extension build.setuptools
,wheel
,cmake
) are all available from PyPI and are installed automatically.