Editable install: do not show any output in verbose mode when there is no work to do

lesteve commented 4 months ago

Why the current behaviour is problematic

The "ninja: no work to do" output can be confusing as noted for example by @ogrisel in https://github.com/scikit-learn/scikit-learn/pull/28040#issuecomment-1886676007. @eli-schwartz mentioned that it could potentially be changed in meson-python https://github.com/scikit-learn/scikit-learn/pull/28040#issuecomment-1888470251.

Personally, when working on scikit-learn code, I find it reassuring to see some output if the sklearn import takes a while and get the feed-back that some compilation is happening. In the case when there is nothing to do I would rather have nothing printed. One thing I realised recently is that the "no work to do" output can be printed multiple times if you use subprocesses (makes complete sense since sklearn is imported in each subprocess). This can happen easily in scikit-learn for example with this snippet.

# /tmp/test.py
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score

model = LogisticRegression()
X, y = make_classification()

print(cross_val_score(model, X, y, n_jobs=4))

Output is:

❯ python /tmp/test.py
+ /home/lesteve/micromamba/envs/scikit-learn-dev/bin/ninja
ninja: no work to do.
+ /home/lesteve/micromamba/envs/scikit-learn-dev/bin/ninja
+ /home/lesteve/micromamba/envs/scikit-learn-dev/bin/ninja
+ /home/lesteve/micromamba/envs/scikit-learn-dev/bin/ninja
+ /home/lesteve/micromamba/envs/scikit-learn-dev/bin/ninja
ninja: no work to do.
ninja: no work to do.
ninja: no work to do.
ninja: no work to do.
[1.   0.85 0.9  0.95 0.95]

Current implementation

The simplest thing I could find was to first execute the ninja command in dry-run mode and to check if "no work to do" is in the captured stdout. This may be a bit brittle in case the ninja output changes. Maybe there is a better way to check with ninja that there is no work to do?

There does not seem to be any test right now for the editable verbose behavior, but I can try to add some if you think this is worth it.

Alternative implementation

If you think the overhead of having a ninja dry-run command is too much overhead, one possible alternative implementation would be to capture stdout with p = subprocess.Popen(..., stdout=subprocess.PIPE) and look at the first two lines of output. In my experience, this kind of manipulation are always more complicated than you would hope, but maybe I have been doing it wrong all this time ...

The kind of complications I have in mind:

p.stdout.read is blocking so in the case that there is no output for some reason, it will wait for ever. Maybe we know there will always be some output with the ninja command. You can also put you can use fcntl to put stdout in non-blocking mode.
displaying the captured stdout live (i.e. without waiting for the command to finish) is also always more complicated than you would hope:
- decide whether to read line by line, a few lines, a number of bytes
- decide whether to sleep between two reads (I have seen some cases where not sleeping had a high CPU usage)
- you also need to remember to redisplay all the remaining stdout once the command finishes otherwise you miss some stdout.

ogrisel commented 4 months ago

I think the lines:

+ /home/lesteve/micromamba/envs/scikit-learn-dev/bin/ninja
ninja: no work to do.

would be ok, if they were prefixed by some user friendlier message:

meson-python: importing sklearn installed in editable mode
+ /home/lesteve/micromamba/envs/scikit-learn-dev/bin/ninja
ninja: no work to do.

But I agree that the repeated message is a bit annoying when using parallelism that spawn Python workers that import sklearn concurrently.

ogrisel commented 4 months ago

But I am also fine with what @lesteve suggests in this PR because it fixes both problems at once.

rgommers commented 4 months ago

Thanks! I agree that this message can be confusing. I hadn't seen it printed multiple times yet, but that makes it worse. So looks like a good idea to suppress it.

lesteve commented 4 months ago

It would be great if someone with the necessary rights could enable the Github actions workflow :pray:!

I can reproduce the Alpine failure in a Docker container and the bad news is that ninja -n seems to segfault (ninja alone works fine). The ninja version is 1.9 (released January 2019) and I have seen at least one test in meson-python that is skipped if the ninja version < 1.10, so maybe this could be an option?

My basic gdb skills only allowed me to get this information:

(gdb) r
Starting program: /usr/bin/ninja -C build/cp312 -n
warning: Error disabling address space randomization: Operation not permitted
ninja: entering directory 'build/cp312'

Program received signal SIGSEGV, Segmentation fault.
0x00007b65464a9458 in strlen (s=<optimized out>, 
    s@entry=0x8 <error: Cannot access memory at address 0x8>) at src/string/strlen.c:17
warning: 17 src/string/strlen.c: No such file or directory

Suggestions how to get more info more than welcome ...

eli-schwartz commented 4 months ago

I can reproduce the Alpine failure in a Docker container and the bad news is that ninja -n seems to segfault (ninja alone works fine). The ninja version is 1.9 (released January 2019) and I have seen at least one test in meson-python that is skipped if the ninja version < 1.10, so maybe this could be an option?

This is:

https://github.com/michaelforney/samurai/issues/66 https://github.com/michaelforney/samurai/issues/81 https://github.com/michaelforney/samurai/commit/fb61f22c7e690715d309c41812412c4f432ef53a

Possibly something Alpine should backport...

lesteve commented 4 months ago

I have skipped the test on Alpine for now based on detecting musllinux.

To know whether the test pass on Windows it would be nice that someone enable Github workflows. Full disclosure: locally the Windows tests don't pass and I don't really know why yet.

rgommers commented 4 months ago

To know whether the test pass on Windows it would be nice that someone enable Github workflows.

Done. this can be a pain because the approve button has to be pressed after every push. If you want to submit a trivial PR to side-step this problem, I'm happy to merge that straight away (e.g, in the readme, change to use Meson to using Meson).

lesteve commented 4 months ago

Thanks, the Windows tests fail in a similar way locally, the dry-run command fails with a exit code 1 and I can't find a reason in the captured stderr as to why, but maybe I am missing something ...

I'll be offline next week but I can try to pick this up when I come back. Any meson-python maintainer should not hesitate to push into my branch if that makes things easier.

lesteve commented 4 months ago

If you want to submit a trivial PR to side-step this problem, I'm happy to merge that straight away (e.g, in the readme, change to use Meson to using Meson).

@rgommers I have opened https://github.com/mesonbuild/meson-python/pull/585 with this change.

rgommers commented 4 months ago

@rgommers I have opened #585 with this change.

Sounds good - merged now. Should fix the issue with CI running here.

dnicolodi commented 4 months ago

This is:

michaelforney/samurai#66 michaelforney/samurai#81 michaelforney/samurai@fb61f22

Possibly something Alpine should backport...

@eli-schwartz Are you implying that Alpine uses samurai but calls it ninja? I wonder if we can pull in a fixed package in our CI image...

dnicolodi commented 4 months ago

Yes, it seems that installing ninja on Alpine Linux puts the executable in a location outside $PATH and that the samuari package install a symlink that makes ninja actually be samurai. samurai seems to be installed with the build-base package, otherwise I don't understand why we get it in our CI images. Anyhow, installing ninja-is-really-ninja should make the ninja command actually be what is expected.

The bug @eli-schwartz mentions was fixed in samurai back in 2021, but samurai hasn't seen a release since 2020.

lesteve commented 3 months ago

The CI failures for the development meson version are also main and do not seem to be related to this PR.

Edit: the development meson builds seem to have been fixed. I don't really know what the freebsd issue is about but it does not seem related to this PR either, see build log

Failed to start an instance: INVALID_ARGUMENT: Not Found 404 Not Found POST https://compute.googleapis.com:443/compute/v1/projects/cirrus-ci-community/zones/us-central1-c/instances
{ "error": { "code": 404, "message": "The resource 'projects/freebsd-org-cloud-dev/global/images/family/freebsd-13-2' was not found", "errors": [ { "message": "The resource 'projects/freebsd-org-cloud-dev/global/images/family/freebsd-13-2' was not found", "domain": "global", "reason": "notFound" } ] } }

dnicolodi commented 3 months ago

Please don't merge the main branch into your branch the next time: it makes rebasing the commits harder without reason.

dnicolodi commented 3 months ago

I added the missing tests and fixed up the last issue and submitted as #594

mesonbuild / meson-python