xsdk-project / xsdk-issues

A repository under which GitHub issues not related to a specific xSDK repo can be filed.
7 stars 0 forks source link

ExaGO build failures in xSDK CI #224

Closed v-dobrev closed 9 months ago

v-dobrev commented 1 year ago

See https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5197024340

ping: @cameronrutherford

cameronrutherford commented 1 year ago

These failures are unique to gcc 11.3+ for some reason.

Simple workaround is to have exago build ~python. A relevant ExaGO issue has been created here to track future progress.

See https://github.com/pnnl/ExaGO/issues/23

v-dobrev commented 1 year ago

Thanks for the response @cameronrutherford!

Do you want to propose an MR to update the CI tests with the proposed workaround? The place to update these is here: https://gitlab.com/xsdk-project/spack-xsdk-ci in the file gitlab-xsdk-ci.yml. If you prefer, I can create an MR too.

cameronrutherford commented 1 year ago

Thanks for the response @cameronrutherford!

Do you want to propose an MR to update the CI tests with the proposed workaround? The place to update these is here: https://gitlab.com/xsdk-project/spack-xsdk-ci in the file gitlab-xsdk-ci.yml. If you prefer, I can create an MR too.

I can make the PR since it would be good for me to be a little more of a regular contributor.

Since this is an error for gcc@11.3 and intel compilers https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5197024342, I was wondering if this should be something that we modify in the xSDK package spec, or instead just within the CI configurations.

I also noticed quite a few of the CI pipelines were failing due to spack being unable to concretize, so I was hesitant to break that even further.

v-dobrev commented 1 year ago

The failures with Intel compilers are not due to ExaGO, I think -- just the ones that use gcc v11.3.1. Specifically, I was looking at the jobs called xd_l_g113-xsdk-080-gcc1131 and xd_l_g113-xsdk-080-linux-gcc1131-cuda.

As you said, the fix can probably be added in the xSDK spack package -- that may be a good solution too. @balay, what do you think?

balay commented 12 months ago

ok pushed exago~python to xsdk spec..

cameronrutherford commented 12 months ago

I guess my comment was lost to the void... Just merged a fix for this build error to exago@develop, so exago+python should be fine.

See https://github.com/pnnl/ExaGO/issues/23 which is now closed. We have preventative checks in place now which should catch issues, and this should be easier to fix after pybind11 is a dep in spack. I assume this happened due to some compiler flags changing in Debug builds.

balay commented 12 months ago

Sorry - misunderstood. I reverted this change.

Will start a pipeline to gather current failures..

https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1030800463

v-dobrev commented 12 months ago

My original post was about the failures in the xSDK 0.8.0 pipelines. We still need to do something about those -- either add a fix the xSDK spack package or in the xSDK CI spec.

balay commented 11 months ago

Tried ^exago~python with xsdk@0.8.0 - still many failures.. [perhaps some are unrelated]

https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1033739756

cameronrutherford commented 11 months ago

Okay so original issue was resolve with +python build issues, but ~python ones are still valid, and captured in:

All other pipeline failures are seemingly not exago related on the surface, and I appreciate you running this build.

Note that we have EXAGO_RUN_TESTS depend on if spack install is enabling testing, so we might need to change that if it affects xSDK testing

cameronrutherford commented 11 months ago

Okay I tried to track everything down. Take away is that https://github.com/pnnl/ExaGO/issues/45 exago+python~mpi and exago~mpi might have issues at the moment, and will be in future spack package. exago~python+mpi bug was really useful to catch, although for some reason we could not reproduce, so thank you for documenting!!

cameronrutherford commented 11 months ago

Closing unless someone has a build failure to follow up on

v-dobrev commented 11 months ago

I'm not sure the issue with xsdk@0.8.0 is resolved since it depends on exago@1.5.0 -- the latest pipeline for xsdk@0.8.0 still has ExaGo failures, e.g. in this job: https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5368514783.

Probably, it is possible to create a patch in Spack for exago@1.5.0 that backport the fixes in the main ExaGo repo?

cameronrutherford commented 11 months ago

I'm not sure the issue with xsdk@0.8.0 is resolved since it depends on exago@1.5.0 -- the latest pipeline for xsdk@0.8.0 still has ExaGo failures, e.g. in this job: https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5368514783.

Probably, it is possible to create a patch in Spack for exago@1.5.0 that backport the fixes in the main ExaGo repo?

I have not made a spack patch before, but yes in principle it would be very easy in this case. Diff is shown in this commit https://github.com/pnnl/ExaGO/commit/4527f0c0e8a99c28387ffa95f06982250cb61b22

It's literally just a one line change, but this patch should also be back-ported through version 1.4.0.

Can you point me to an example to work from? Happy to do this as it aligns with sustainment goals, and is a simple patch.

v-dobrev commented 11 months ago

Here is an example Spack PR that adds a patch for MFEM v4.6: https://github.com/spack/spack/pull/40495.

I think there's a way to use a commit from any repository as the patch, however, I have not used that before. Here are some examples I found:

balay commented 10 months ago

reopening this issue as xsdk@0.8.0 CI is still broken - and its good to get that working.

https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5538901463

>> 130    CMake Error at tests/unit/opflow/objective/CMakeLists.txt:42 (exago
            _add_test):
     131      Unknown CMake command "exago_add_test".
     132    
     133    
     134    -- Configuring incomplete, errors occurred!

A new issue?

https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5538901445

>> 127    CMake Error at buildsystem/cmake/ExaGOCheckPython.cmake:5 (if):
     128      if given arguments:
     129    
     130        "VERSION_LESS" "3.6"
     131    
     132      Unknown arguments specified
     133    Call Stack (most recent call first):

Is is probably from the prior suggestion exago~python so that doesn't really work? So will have to recheck the original issue and fix it?

balay commented 10 months ago
 131      Unknown CMake command "exago_add_test".

Looks like this is triggered by https://github.com/spack/spack/commit/9f0e3c0fed23e4b4192cb769c8d96d2248721fc9

That change I guess works with 1.5.1 but not 1.5.0.

@cameronrutherford How do we fix this?

cameronrutherford commented 10 months ago

Here is an example Spack PR that adds a patch for MFEM v4.6: spack/spack#40495.

I think there's a way to use a commit from any repository as the patch, however, I have not used that before. Here are some examples I found:

Okay I now understand that there are multiple ways of doing this, and interestingly enough there seems to be bugs in certain configurations? I resolved https://github.com/pnnl/ExaGO/issues/71 for @pelesh by pinning libffi version here, and so if we connect the error:

==> Error: Couldn't find patch for package builtin.libffi with sha256: 070b1f3aa87f2b56f83aff38afc42157e1692bfaa580276ecdbad2048b818ed7. This usually means the patch was modified or removed. To fix this, either reconcretize or use the original package repository

With the type of patch libffi is trying to provide here:

patch(
        "https://github.com/libffi/libffi/commit/ce077e5565366171aa1b4438749b0922fce887a4.patch?full_index=1",
        sha256="070b1f3aa87f2b56f83aff38afc42157e1692bfaa580276ecdbad2048b818ed7",
        when="@3.4.3:3.4.4",
    )

It would suggest that these plaintext patch files are the most sustainable way of doing things? In this case I guess the fix is to trend towards those types of patches, but not sure what's breaking down in the issue I linked...

cameronrutherford commented 10 months ago

Hopefully https://github.com/spack/spack/pull/41350 fixes things for all builds, even xSDK specific versions

balay commented 10 months ago

started a pipeline with https://github.com/spack/spack/pull/41350 changes

https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1091197340

balay commented 10 months ago

It would suggest that these plaintext patch files are the most sustainable way of doing things?

I think a URL to a repo (project) commit is preferred over adding the patchfile to spack (if possible)

cameronrutherford commented 10 months ago

It would suggest that these plaintext patch files are the most sustainable way of doing things?

I think a URL to a repo (project) commit is preferred over adding the patchfile to spack (if possible)

So I would be making branches that contain new commits based on each tag that apply the same patch? If this is a patch that I am happy applying to all versions regardless, should I instead maybe just apply the commit here to all the branches and create new tags and commit shas?

balay commented 10 months ago

Well - if you are supporting all these versions - you might have "release" branches [for bug fixes] - for each of these versions. And the corresponding patches might be there.

But yeah - creating these commits/branches just for spack support might not be worth it.

It would be nice if a single patch file can apply cleanly to multiple pkg versions [but yeah - this doesn't always pan out]

So can wait and see if any reviewer objects to the current (patches in spack) approach.

balay commented 10 months ago

BTW: I see:

balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.0.patch var/spack/repos/builtin/packages/exago/exago-1.1.1.patch
2c2
< index d0e8ed18..3225509c 100644
---
> index 9d3d871b..f682e2a5 100644
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.1.patch var/spack/repos/builtin/packages/exago/exago-1.1.2.patch
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ 

So you can use a single patch file for these 3 versions.

2c2
< index ba8e145f..89dd3327 100644
---
> index 1fcfbc97..0d055bee 100644
19c19
< @@ -392,15 +394,17 @@ else()
---
> @@ -405,15 +407,17 @@ else()

So these could be collapsed if spack allowed patches that apply with "fuzz" [I'm not sure if this can be enabled - if so the duplicates can be removed]

If this were feasible - it would reduce the patches to only 3 files [or 3 commits in repo branch].

cameronrutherford commented 10 months ago

BTW: I see:

balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.0.patch var/spack/repos/builtin/packages/exago/exago-1.1.1.patch
2c2
< index d0e8ed18..3225509c 100644
---
> index 9d3d871b..f682e2a5 100644
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.1.patch var/spack/repos/builtin/packages/exago/exago-1.1.2.patch
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ 

So you can use a single patch file for these 3 versions.

2c2
< index ba8e145f..89dd3327 100644
---
> index 1fcfbc97..0d055bee 100644
19c19
< @@ -392,15 +394,17 @@ else()
---
> @@ -405,15 +407,17 @@ else()

So these could be collapsed if spack allowed patches that apply with "fuzz" [I'm not sure if this can be enabled - if so the duplicates can be removed]

If this were feasible - it would reduce the patches to only 3 files [or 3 commits in repo branch].

I also had the same question about duplicates and whether they can be removed. I decided on separate patches in this case to ensure functionality, but each patch is basically identical in quite a few ways...