single-cell-data / TileDB-SOMA

Python and R SOMA APIs using TileDB’s cloud-native format. Ideal for single-cell data at any scale.
https://tiledbsoma.readthedocs.io
MIT License
87 stars 25 forks source link

[python] Verify make-clean interaction with `pip install -e apis/python` #2937

Open johnkerl opened 3 weeks ago

johnkerl commented 3 weeks ago

Follow-up from a Slack conversation with @ivirshup .

Use-case:

We should ideally be doing a clean build.

Note that makefile rules depend on timestamps of artifacts (e.g. .o files) relative to source files (e.g. .cc). So this is of particular interest in the case when the user has a build at a newer hash, then checks out a previous hash.

ryan-williams commented 3 weeks ago

I have been manually running make clean in between pip installs like those in your example.

Is there ever a use-case for not cleaning in between? e.g. if changes were being made to setup.py that only affect the Python library, so that reusing an already-built libtiledbsoma would save time? Probably not be very common…

johnkerl commented 3 weeks ago

Agreed!

There's a definite use-case for not cleaning in between -- which is a developer doing full-time soma dev work, building several times per day, day after day. For such people, difference between a build-from-clean and an incremental build can really add up over the course of a workday. (I know because I'm one of those people! :) )

However, this is a very small number of people, and the default behavior should definitely be to do a clean on every pip-install -- our behavior should be "correct by default" for the largest number of people. And people who do soma dev all day (like myself) can make exceptions for themselves. (I'd also note that I already have some sneaky/fiddly/blackbox aliases for incremental builds, to make them as fast as possible -- I already don't pip-install after every time I touch a .h or .cc file within libtiledbsoma. And, like you, i'm in the habit of doing a manual clean when I want to.)

ivirshup commented 3 weeks ago

This may be overkill, but scipy has a nice system where if you install from source with pip install -e you get rebuilds at import time if any of the compiled code changed. They've also got pretty nice partial compilation which seemed to do a very good job at cache invalidation. E.g. I would basically do a long full compilation once per environment, then incremental rebuilds were quite fast.