Open matthewfeickert opened 1 month ago
Actually, making https://gitlab.cern.ch/srothman/roccor a CMake project is a good idea all around, and should be easy given how nice and small the project is. I'd recommend An Introduction to Modern CMake if you aren't used to using CMake yet.
@henryiii pointed to this new Packaging tutorial: https://intersect-training.org/packaging/
All of the above discuss the issues in detail. If you want a quick-start, I like the scientific-python/cookie recipe:
In a new directory, run
cookiecutter gh:scientific-python/cookie
select the "pybind11" or "scikit-build" backend (because the C++ is connected to Python through pybind11), push the generated directory into a GitHub repo (or a new, blank branch of PyRoccoR), and gradually copy the code into this directory. This directory has all of the standard things, like code formatters, linters, a testing framework, and GitHub CI integration, already built-in.
The "scikit-build" backend requires CMake and "pybind11" doesn't. But the CMakeLists.txt for a small project would not be complicated, and it allows for expansion later.
Also noting
https://github.com/ssrothman/PyRoccoR/blob/93537dad6043749af691ddc33a765a210ae934c6/setup.py#L7-L25
this is deprecated behavior (c.f. Why you shouldn't invoke setup.py directly)
I'd recommend scikit-build-core + CMake or meson-python + Meson over the classic guide. Setuptools is fragile and will give you a very poor compiling experience.
numpy.distutils
This has been removed.
Wow, thanks so much for all the help and advice! I will take a look at these tutorials and follow up as needed with questions :)
Thanks again @jpivarski, @henryiii , and @matthewfeickert for pointing me in the right direction. I've ported PyRoccoR to the cookiecutter template as you suggested, and more or less have it working in a new branch (here). I can now pip install things seem to be working correctly. However I have a few questions, both about how to set up a nice package, and also about what the most desirable approach should be for the underlying implementation.
First, the (probably very naive) questions about the process of building a package:
src/pyroccor/__init__
, but code I write there doesn't seem to make it into the pip install
-ed library. I assume there is something obvious that I am missing here?
pip install .
every time I change something in the source. What is the actual right way to go about things? Can I rebuild without reinstalling? Second, a request for advice about the backend implementation:
I see two different possible ways to implement the c++ -> python wrapper.
The first way (which is what I had done originally) is to define some custom numpy ufuncs. This has the advantage that it works out-of-the-box with numpy extensions such as awkward arrays (which I would very much like to support). However, this has the disadvantage that you need to supply the corrections txt
file path at compile time (which is terrible for portability) and you need to redefine each method for each data taking period (ie pointing at each unique corrections text file). This is not great.
The alternative way is to use what appears to be the much cleaner interface provided by pybind11. Here I can just wrap the entire class, so the user can supply the text file paths at runtime (much better!). However, the py::vectorize magic doesn't seem to work out-of-the-box for awkward arrays.
For the moment (and also as a learning exercise for myself) I've actually implemented both options, but probably only one should actually be supplied to hypothetical end-users
It seems to me that the best thing would be to have a way for the pybind11 interface to also support awkward arrays. Is there some easy way to do this? I briefly tried looking through the awkward-cpp implementation and quickly got lost trying to understand the code.
If that isn't possible, is there some clean way to mix the approaches by defining the pybind11 class member functions as numpy ufuncs?
Or am I completely barking up the wrong tree and there is some easier/better way to approach the whole problem?
Thanks again for your help, and sorry if I am asking some very stupid questions :)
- It's not clear to me from the example and the docs how to actually set up the nice python interface. Presumably I should do something in the
src/pyroccor/__init__
, but code I write there doesn't seem to make it into thepip install
-ed library. I assume there is something obvious that I am missing here?
@ssrothman Sorry, I don't think I understand what you're asking here. Are you asking how to use pybind11
or are you asking how to structure a Python module? It isn't clear. I'm also not sure what you mean that things aren't getting installed. If you make a PR of your packaging
branch to main
it would be easier to discuss specific examples in the PR itself.
- I assume that there is a better way to iterate on things than calling
pip install .
every time I change something in the source. What is the actual right way to go about things? Can I rebuild without reinstalling?
If you're talking about code that is in the compiled C++ extensions, then no, you do need to install into your development virtual environment as editable installs don't work with compiled extensions, by nature of what editable installs are. What is the situation in which you want to rebuild without installing?
- How does docs generation work? Hopefully there is some nice tutorial about this as well?
You can use whatever docs build system you'd like (probably Sphinx or MyST). c.f. the Scientific Python Writing documentation development guide.
I see two different possible ways to implement the c++ -> python wrapper.
The first method of custom NumPy ufuncs seems like a non-starter as described. I would use pybind11
or nanobind
.
It seems to me that the best thing would be to have a way for the pybind11 interface to also support awkward arrays.
As you noted, awkward-cppitself uses
pybind11`, so there is no problem here. I'll let @jpivarski comment more on how to think about interfacing with Awkward in general, but if you have explicit examples of what you want to do but can't/don't know how that would be useful.
Aside: I'd strongly suggest not capitalizing your package name as
[project]
name = "PyRoccoR"
That is both not convention in the broader Python world, and also means that people need to remember which letters to capitalize when they're typing, which is annoying and going to lead to typos. So I'd suggest changing it to pyroccor
everywhere.
@lgray noted that there was interest in making this a Python package. I'd recommend starting with the excellent Scientific Python Library Development Guides specifically the Compiled packaging one if you can make https://gitlab.cern.ch/srothman/roccor a CMake project. If you can't then I'd suggest the Classic packaging guide.
Feel free to leave this Issue open to ask questions. cc @henryiii @jpivarski