ssrothman / PyRoccoR

0 stars 0 forks source link

Python Packaging #1

Open matthewfeickert opened 1 month ago

matthewfeickert commented 1 month ago

@lgray noted that there was interest in making this a Python package. I'd recommend starting with the excellent Scientific Python Library Development Guides specifically the Compiled packaging one if you can make https://gitlab.cern.ch/srothman/roccor a CMake project. If you can't then I'd suggest the Classic packaging guide.

Feel free to leave this Issue open to ask questions. cc @henryiii @jpivarski

matthewfeickert commented 1 month ago

Actually, making https://gitlab.cern.ch/srothman/roccor a CMake project is a good idea all around, and should be easy given how nice and small the project is. I'd recommend An Introduction to Modern CMake if you aren't used to using CMake yet.

jpivarski commented 1 month ago

@henryiii pointed to this new Packaging tutorial: https://intersect-training.org/packaging/

All of the above discuss the issues in detail. If you want a quick-start, I like the scientific-python/cookie recipe:

In a new directory, run

cookiecutter gh:scientific-python/cookie

select the "pybind11" or "scikit-build" backend (because the C++ is connected to Python through pybind11), push the generated directory into a GitHub repo (or a new, blank branch of PyRoccoR), and gradually copy the code into this directory. This directory has all of the standard things, like code formatters, linters, a testing framework, and GitHub CI integration, already built-in.

The "scikit-build" backend requires CMake and "pybind11" doesn't. But the CMakeLists.txt for a small project would not be complicated, and it allows for expansion later.

matthewfeickert commented 1 month ago

Also noting

https://github.com/ssrothman/PyRoccoR/blob/93537dad6043749af691ddc33a765a210ae934c6/setup.py#L7-L25

this is deprecated behavior (c.f. Why you shouldn't invoke setup.py directly)

henryiii commented 1 month ago

I'd recommend scikit-build-core + CMake or meson-python + Meson over the classic guide. Setuptools is fragile and will give you a very poor compiling experience.

henryiii commented 1 month ago

numpy.distutils

This has been removed.

ssrothman commented 1 month ago

Wow, thanks so much for all the help and advice! I will take a look at these tutorials and follow up as needed with questions :)

ssrothman commented 1 month ago

Thanks again @jpivarski, @henryiii , and @matthewfeickert for pointing me in the right direction. I've ported PyRoccoR to the cookiecutter template as you suggested, and more or less have it working in a new branch (here). I can now pip install things seem to be working correctly. However I have a few questions, both about how to set up a nice package, and also about what the most desirable approach should be for the underlying implementation.

First, the (probably very naive) questions about the process of building a package:

Second, a request for advice about the backend implementation:

I see two different possible ways to implement the c++ -> python wrapper.

  1. The first way (which is what I had done originally) is to define some custom numpy ufuncs. This has the advantage that it works out-of-the-box with numpy extensions such as awkward arrays (which I would very much like to support). However, this has the disadvantage that you need to supply the corrections txt file path at compile time (which is terrible for portability) and you need to redefine each method for each data taking period (ie pointing at each unique corrections text file). This is not great.

  2. The alternative way is to use what appears to be the much cleaner interface provided by pybind11. Here I can just wrap the entire class, so the user can supply the text file paths at runtime (much better!). However, the py::vectorize magic doesn't seem to work out-of-the-box for awkward arrays.

For the moment (and also as a learning exercise for myself) I've actually implemented both options, but probably only one should actually be supplied to hypothetical end-users

It seems to me that the best thing would be to have a way for the pybind11 interface to also support awkward arrays. Is there some easy way to do this? I briefly tried looking through the awkward-cpp implementation and quickly got lost trying to understand the code.

If that isn't possible, is there some clean way to mix the approaches by defining the pybind11 class member functions as numpy ufuncs?

Or am I completely barking up the wrong tree and there is some easier/better way to approach the whole problem?

Thanks again for your help, and sorry if I am asking some very stupid questions :)

matthewfeickert commented 1 month ago
  • It's not clear to me from the example and the docs how to actually set up the nice python interface. Presumably I should do something in the src/pyroccor/__init__, but code I write there doesn't seem to make it into the pip install-ed library. I assume there is something obvious that I am missing here?

@ssrothman Sorry, I don't think I understand what you're asking here. Are you asking how to use pybind11 or are you asking how to structure a Python module? It isn't clear. I'm also not sure what you mean that things aren't getting installed. If you make a PR of your packaging branch to main it would be easier to discuss specific examples in the PR itself.

  • I assume that there is a better way to iterate on things than calling pip install . every time I change something in the source. What is the actual right way to go about things? Can I rebuild without reinstalling?

If you're talking about code that is in the compiled C++ extensions, then no, you do need to install into your development virtual environment as editable installs don't work with compiled extensions, by nature of what editable installs are. What is the situation in which you want to rebuild without installing?

  • How does docs generation work? Hopefully there is some nice tutorial about this as well?

You can use whatever docs build system you'd like (probably Sphinx or MyST). c.f. the Scientific Python Writing documentation development guide.

I see two different possible ways to implement the c++ -> python wrapper.

The first method of custom NumPy ufuncs seems like a non-starter as described. I would use pybind11 or nanobind.

It seems to me that the best thing would be to have a way for the pybind11 interface to also support awkward arrays.

As you noted, awkward-cppitself usespybind11`, so there is no problem here. I'll let @jpivarski comment more on how to think about interfacing with Awkward in general, but if you have explicit examples of what you want to do but can't/don't know how that would be useful.

matthewfeickert commented 1 month ago

Aside: I'd strongly suggest not capitalizing your package name as

[project]
name = "PyRoccoR"

That is both not convention in the broader Python world, and also means that people need to remember which letters to capitalize when they're typing, which is annoying and going to lead to typos. So I'd suggest changing it to pyroccor everywhere.