kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.46k stars 514 forks source link

Swap --max-order to use an environment variable given PEP 517 and 518 #420

Closed jacobkahn closed 1 year ago

jacobkahn commented 1 year ago

Fixes the max order issue outlined in https://github.com/kpu/kenlm/pull/417 -- PEP 517 performs isolated builds by default when pyproject.toml is present, and isolated builds are the future of Python/pip. --install-option is ignored in isolated builds, and when used, it also prevents binary and wheel caching (see https://github.com/pypa/pip/issues/11451), which dramatically slows down the build.

ZJaume commented 1 year ago

Thank you so much! This method seems to be working. In addition to this, I'm going to suggest keep both methods (cli argument and env variable). As far as I know, it is not possible to specify the environment variable in a requirements file, so every tool that needs this as dependency, would need to be installed like MAX_ORDER=7 pip install mytool. I was investigating if there is some way to specify the max order in the requirements and discovered that --config-settings is aimed to replace --install-options. Then, doing some tests, I found that the previous code to this PR is able to build without triggering the legacy method and have the argument passed with:

pip install --config-settings="--build-option=--max_order=7" .

There's a little downside in this method but I suppose it doesn't matter, since binaries are compiled correctly. When invoking the above pip install command with -v, one of the three times setup.py is called (metadata build), it seems that the parameter is not passed so it shows order 6.

  Installing build dependencies ... done
  Running command Getting requirements to build wheel
  Will build with KenLM max_order set to 7
  running egg_info
  ...
  Getting requirements to build wheel ... done
  Running command Preparing metadata (pyproject.toml)
  Will build with KenLM max_order set to 6
  running dist_info
...
Building wheels for collected packages: kenlm
  Running command Building wheel for kenlm (pyproject.toml)
  Will build with KenLM max_order set to 7
  running bdist_wheel

Given that --config-settings per requirement is not yet merged, I think we should keep the environment variable approach. So my suggestion would be something like this:

max_order = os.getenv("MAX_ORDER", "6")
is_max_order = [s for s in sys.argv if "--max_order" in s]
for element in is_max_order:
    max_order = re.split('[= ]',element)[1]
    sys.argv.remove(element)

print(f"Will build with KenLM max_order set to {max_order}")