spack / spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
https://spack.io
Other
4.26k stars 2.26k forks source link

Spack depfile does not check if installed packages have changed leaving stale "markerfiles" #44169

Open robertu94 opened 5 months ago

robertu94 commented 5 months ago

Steps to reproduce

spack env activate .
spack env depfile -o Makefile
make -j $(nproc)
spack config edit # change a dependency
spack env deactivate
spack uninstall --all --dependents $foo # you might have to do this to get the concretizer to resolve or you did this accidently
spack env activate .
spack concretize -f
spack env depfile -o Makefile
make -j $(nproc)

Error message

uninstalled dependency for $foo

because the generated make file does not introduce dependencies for all packages in the envionment on the spack.yaml, the spack.lock file, or some internal spack database file to force a recheck, means that make doesn't work as expected if a dependency was uninstalled and make clean needs to be called before make will function as expected. In contrast, spack install works just fine.

Information on your system

General information

haampie commented 5 months ago

That's just how it works. Run make clean.

robertu94 commented 5 months ago

Ok, I understand if this is a closed won't fix, but could we at least improve the error message so that the user should probably run make clean? spack install will also run without errors, but this leaves a sharp edge for users.

psakievich commented 3 weeks ago

I'm seeing this as well. But make clean is just inching the graph along. Perhaps there is an issue b/c I have an upstream? I'm not sure.

spack  env depfile -o Makefile
make -j90 # fails
make clean
make -j90 # inches farther but still fails
tgamblin commented 3 weeks ago

I don't think this should've been closed initially. If the build fails and the generated makefile is out of sync with the environment, that's... not great.

I don't think make clean is exactly the solution to this problem, either.

The original PR complains about several things:

  1. depfile does not check if installed packages have changed leaving stale "markerfiles"
  2. the generated make file does not introduce dependencies for all packages in the envionment on:
    1. the spack.yaml
    2. the spack.lock file, or
    3. some internal spack database file to force a recheck

I think (1) is meant to be explained by (2.i-iii), but they're not quite the same. If packages change, and you re-concretize your environment, you need to regenerate the whole Makefile, because its hashes describe the DAG from the old env concretization.

(2.i) would require a re-concretization to take effect, so would also need the makefile to be regenerated. 2.ii means you re-concretized, but you still need to regenerate the makefile because its hashes are stale. How deep do we want spack env depfile to go here?

I think (2.iii) would actually be easy to solve, as you can (mostly) tell that a spec is installed by looking for its spec.json:

> ls $(spack location -i /g7ddhkk)/.spack/spec.json
/Users/gamblin2/Workspace/src/spack/opt/spack/darwin-sonoma-m1/apple-clang-15.0.0/python-3.11.0-g7ddhkk6ikmwneadfvsp2ck7k3jk3iss/.spack/spec.json

If that disappears, you should re-run the build. So you could make the marker files dependent on spec.json files in the install tree to solve the issue that if specs are uninstalled they are not rebuilt. make clean would also solve this, but would require you to run a lot more spack installs than are really necessary.

I think @psakievich is having some other issue, though. In this workflow:

spack  env depfile -o Makefile
make -j90 # fails
make clean
make -j90 # inches farther but still fails

Why are the builds failing and why is the second make getting further? Are you changing package files between calls to make?

psakievich commented 3 weeks ago

No changes. I don't understand it at all. The target is saying it can't install because dependencies are missing but they are actually installed. I'll run a few more cases to confirm. It could be getting further based on an open edge in the graph and I missed it.

I've always regenerated the Makefile if I re-concretize, but I haven't habitually run make clean to clear out the artifacts in .spack-env. I was hoping that would resolve it.

psakievich commented 3 weeks ago

This is what I'm hitting when the depfile build fails: https://github.com/spack/spack/blob/2c36a8aac3dab1d86a8cb60630e460d1db1b7f35/lib/spack/spack/installer.py#L1925

psakievich commented 3 weeks ago

I just reconfirmed the behavior:

make clean
rm Makefile
spack concretize -fU
make -j90 # failure of detected uinstallled dependencies that have been installed
make clean
make -j90 # package that failed now builds, but something else fails saying deps weren't installed when they are
psakievich commented 3 weeks ago

This is almost certainly a different issue. I can create one or keep going here. I have no idea how to make a reproducer for this one.

tgamblin commented 3 weeks ago

Can you try it with lower concurrency than 90? I'm curious whether you see the issue with, e.g., make -j2, -j4,, -j8, etc.

psakievich commented 2 weeks ago

I'll try lowering -j. I've gone down to 30. Also tried 1 but it was taking a long long time.

I'm wondering if this is a 2ii or 2iii issue. What is strange to me is the dependencies are installed in the spack database but it is the installer error saying the dependencies are missing. I started trying to understand the installer. How can it miss this? Perhaps it does not check the database but rather the process stack for the dependencies skin to 2iii.

Seems like however it is encoded in the makefile is the truth to the installer.

psakievich commented 2 weeks ago

also hit it at -j16 and with different versions of make. Trying with -j4 now. One of my colleagues was able to get it to consistently work with -j16 but not -j35. I'm trying to understand how the installer's queue is related to the make artifacts. It is concerning to me that this is 1) failing in spack for saying dependencies aren't installed that are installed, and 2) spack install works.

If one of those weren't true I would be inclined to say there could be an issue with the make version or our graph might have a deficiency that has some how hit a corner case not yet found. Either of those could still be true, but I really want to understand the relationship of the makefile to the installer.py first. @tgamblin any thoughts?

psakievich commented 2 weeks ago

Current hypothesis, for which I will test later: we are using develop specs to avoid restaging costs. With the monolithic repo we have perhaps the dependencies are getting marked as needing to be overwritten which is then leading to them getting marked as uninstalled? I will check later tonight when I can get back to this. This comment is mainly so my brain doesn't forget.

psakievich commented 2 weeks ago

This looks like it was the issue. So for us it is very specific to using git_sparse_paths without a proper implementation of is_develop_and_has_changed. See #46529

tgamblin commented 2 weeks ago

I really want to understand the relationship of the makefile to the installer.py first

They're only really related in that they're doing similar things. installer.py implements a priority queue that pulls packages to install off the queue when their dependencies are built. It discovers what's built by looking in the DB and by taking read and write locks out on install directories. Basically, before any package can be built you need read locks on all of its dependencies, and a write lock on the thing to build. There is an algorithm in there that gradually learns what is/isn't installed by trying to take these locks, and multiple Spack instances can coordinate through that (e.g. you can srun -N 4 -n 16 spack install -j4 and have it run 4 spacks on each of 4 different nodes and they'll coordinate through locks).

The Makefile is a bit simpler. It just exposes Spack's dependencies to make as dependencies among a bunch of artificial files named after hashes in your lockfile. To build one of those files, it runs a very specific spack install command and touches each hash file on success. That's pretty much it. This is the part of the makefile you care about:

# The spack install commands are of the form:
# spack -e my_env --only=package --only=concrete /hash
# This is an involved way of expressing that Spack should only install
# an individual concrete spec from the environment without deps.
/path/to/env/.spack-env/makedeps/install/%: | /path/to/env/.spack-env/makedeps/dirs
        +$(SPACK) -e '/path/to/env' install $(SPACK_BUILDCACHE_FLAG) $(SPACK_INSTALL_FLAGS) --only-concrete --only=package /$(HASH) # $(\
SPEC)
        @touch $@

It's just touching files and letting make handle the deps. It can launch many concurrent spack instances (which is fine -- see the bit about locks above), and they will each enter installer.py and try to get read locks on all deps before installing a package. But each one will only try to install a very specific node in the DAG identified by the hash. Any particular spack instance will fail if its deps are not installed, b/c it's run with --only=package, and it won't try to concretize abstract stuff (b/c it's run with --only=concrete). It's assuming that make has handled the deps.

The advantage of it is really that make is quite flexible about allowing builds to share jobs among a fixed thread pool, because it leverages gmake's job server.

psakievich commented 2 weeks ago

Thanks @tgamblin yes this helps me close the loop. I think I have the full picture of what went wrong and the strange behavior with this explanation + my discovery of the is_develop_and_has_changed shortcoming. I'm going to stress test it today with a strong scaling study of the build 🤞🏻.