scikit-build / scikit-build-core

A next generation Python CMake adaptor and Python API for plugins
https://scikit-build-core.readthedocs.io
Apache License 2.0
247 stars 52 forks source link

Support semicolon-separated lists in SKBUILD_CMAKE_ARGS #927

Open vyasr opened 1 month ago

vyasr commented 1 month ago

This is essentially the converse of https://github.com/scikit-build/scikit-build-core/pull/727. The logic for splitting up items in SKBUILD_CMAKE_ARGS does not account for the possibility of entries that are themselves semicolon-separated lists. Consider

SKBUILD_CMAKE_ARGS="-DCMAKE_PREFIX_PATH='/path/to/a;/path/to/b';-DCMAKE_MODULE_PATH='/path/to/c;/path/to/d'" python -m pip wheel /path/to/project
LecrisUT commented 1 month ago

This was recently implemented: https://github.com/scikit-build/scikit-build-core/pull/921. How it works for environment variable definition, that I am not so sure though, can you test it against the main branch?

vyasr commented 1 month ago

Sure I can test that, although I wouldn't expect that to work. That is implementing the behavior of CMake defines, not the more general case of arbitrary CMake arguments. That being said, if that works it would address the example I showed above since both CMAKE_PREFIX_PATH and CMAKE_MODULE_PATH could be handled via SKBUILD_CMAKE_DEFINE, which #921 would fix. It wouldn't work if I wanted to pass something like --debug-find as a CMake argument, though.

LecrisUT commented 1 month ago

It wouldn't work if I wanted to pass something like --debug-find as a CMake argument, though.

Why wouldn't it? Like passing multiple args doesn't work or passing a value with semicolon doesn't work? If the former that would seem like a bug, at least through my reading of the documentation, didn't check the implementation. If the latter, we should check what usecases other than SKBUILD_CMAKE_DEFINE it applies to.

Btw CMAKE_PREFIX_PATH and CMAKE_MODULE_PATH could be problematic. I don't think we properly forward those options to append/prepend with the scikit-build-core defined ones, we should discuss how these should be handled, probably in conjunction with #880

vyasr commented 1 month ago

Apologies I thought I included information on this in the original issue. The problem is on this line for parsing out the environment variable. The split on the semicolon does not account for the possibility of the semicolon being inside quotes. For example, if I take the C API example from the getting started page and add

message("CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}")                                                                                                                                     
message("CMAKE_MODULE_PATH=${CMAKE_MODULE_PATH}")     

right after the python_add_library call in CMakeLists.txt, here's what I get (eliding unnecessary details with ...):

(scikit_build_core_dev) vyasr-dt% SKBUILD_CMAKE_ARGS="-DCMAKE_PREFIX_PATH='/path/to/a;/path/to/b';-DCMAKE_MODULE_PATH='/path/to/c;/path/to/d'"  python -m pip wheel --disable-pip-version-check . -v --no-build-isolation
Processing /home/vyasr/local/testing/skbc/skbuild_cmake_args
...
*** Configuring CMake...
  CMake Warning:
    Ignoring extra path from command line:

     "/path/to/d'"
...
  -- Found Python: /home/vyasr/miniconda3/envs/scikit_build_core_dev/bin/python (found version "3.10.13") found components: Interpreter Development.Module
  CMAKE_PREFIX_PATH='/path/to/a
  CMAKE_MODULE_PATH='/path/to/c
  -- Configuring done (0.2s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmp0z7d375p/build
...
Successfully built example
vyasr commented 1 month ago

I think you're probably right that CMAKE_PREFIX_PATH and CMAKE_MODULE_PATH aren't good examples here though because we may not support setting them in this way without conflicting with scikit-build-core's methods of setting them. The more general question is, how do you set any variable via SKBUILD_CMAKE_ARGS that could be a list? It doesn't have to be a built-in CMake variable or property, it could be something that is specific to your build e.g. SKBUILD_CMAKE_ARGS="-DMY_VAR='a;b;c;'".

LecrisUT commented 1 month ago

But for the case of SKBUILD_CMAKE_ARGS we want to encourage the usage of SKBUILD_CMAKE_DEFINE and we have documented that an escape \ is needed ^1 although indeed an example for passing a list and list-of-list is missing. Would be pretty straightforward to expand the documentation there. Defining a specific environment variable to listen to should be supported though:

[tool.scikit-build.cmake.define]
SOME_DEFINE = {env="SOME_DEFINE", default="EMPTY"}

But I think what you really want there is to parse the quotation marks in -D<Var>="<Value>" and pass the contents as-is. Could be quite a complication to do the escaping, wonder if there is some more clever way of doing it. Maybe using argparse and white space as separators instead?

vyasr commented 1 month ago

Before @henryiii updated #727 to use shlex (which was a great solution in that case) I was using a regex to capture this. In this instance maybe that's the only way forward? Regular expressions do seem like the right tool in this case; I am skeptical that any parsing engine we tried to build would be any more foolproof or less complicated.

LecrisUT commented 1 month ago

What about deprecating the ; split syntax? Internally it should work fine if we change the splitting there to also be shlex. Or we could do both, try a regex for the ; split and then do shlex on top.

Do you have a regex101 link of the regex you think of using to test some edge cases?

vyasr commented 1 month ago

How about something like this?

In [25]: import re
    ...: cmake_args = "-DCMAKE_PREFIX_PATH='/path/to/a;/path/to/b';-DCMAKE_MODULE_PATH='/path/to/c;/path/to/d'"
    ...: [x for x in re.split('''((?:[^;"']|"[^"]*"|'[^']*')+)''', cmake_args) if (stripped := x.strip()) and not re.match(";+", stripped)]
Out[25]: 
["-DCMAKE_PREFIX_PATH='/path/to/a;/path/to/b'",
 "-DCMAKE_MODULE_PATH='/path/to/c;/path/to/d'"]

https://regex101.com/r/auHBbH/1

vyasr commented 1 month ago

I'm not sure what syntax you're suggesting to deprecate since the core syntax is CMake's and not SKBC's.

LecrisUT commented 1 month ago

The deprecation would be in your example the ; between -DCMAKE_PREFIX_PATH and -DCMAKE_MODULE_PATH

haampie commented 2 weeks ago

Since I just opened and closed #944, let me add to this:

  1. Use of CMAKE_PREFIX_PATH is required in certain package managers like Nix, Guix, Spack where every package (python or not) lives in its own prefix, and pip is not used to setup the build dependencies, but Nix/Guix/Spack is.
  2. The good old way of shlex.split(os.environ.get("SKBUILD_CMAKE_DEFINE")) is both simpler and more robust, this is what scikit-build did.

I guess a better solution would be in the style of #944 where parsing is avoided by passing -Ccmake.define.FOO=BAR. For --debug-find (which I would really like to be able to use...) that does not suffice.

Can I instead have -Ccmake.arg=--debug-find and -Ccmake.arg=-DFOO:BOOL=BAR etc?