Closed gruenich closed 11 months ago
This branch would add two Windows runners, one with WinGetOpt and one without. The tests are executed for the former. For d_tests are segfaulting.
As pointed out in #108, reverting cf93 fixes the issue.
@xiaoyeli I suggest to revert all changed made as part of #117 as part of this pull request. We have to reach out to @evan-dsa to re-create his changes in a way that they a) do not segfault SuperLU's test suite and b) cover not only double but also the other three types s, c, and z. Do you agree? Should I update this pull request to revert #117 completely?
All 4 versions should be correct now. Do you still see segfaults?
Thank you for the promt reply. I rebased this branch to include your changes and dropped the reversal of the problematic patch from pull request 117. Unfortunately, we get even more segfaults, not only for double but also for the other types.
The tests are fine with Linux and MacOS. I have no experience with Windows, so I don't know what goes wrong with Windows.
The tests are fine with Linux and MacOS. I have no experience with Windows, so I don't know what goes wrong with Windows.
Well I hope you are not suggesting to ignore the error.
To tackle the problem, I'd suggest to focus on the test code first, then maybe the SuperLU code.
dtest crashes for imat=7
in dgst01
line 101:
For imat=7
the 9x9 test matrix has all columns > 5 set to zero. At loop k=5
in dgst01
the value for urow
gets -1 at some point, which produces a garbage offset for superno
. I also checked this on Debian/GCC and it's the same.
One might be lucky and the memory at superno = Lstore->col_to_sup[urow]
for urow=-1
is some reasonable number, but I have to express my full support for MSVC producing an access violation exception here!
Valgrind also catches this correctly:
Invalid read of size 4
at 0x10C19D: dgst01
by 0x10B4CA: main
Address 0x51005dc is 4 bytes before a block of size 40 alloc'd
at 0x48407B4: malloc (vg_replace_malloc.c:381)
by 0x117D9F: superlu_malloc
by 0x117FA3: int32Malloc
by 0x12588F: dLUMemInit
by 0x11D830: dgstrf
by 0x118D35: dgssv
by 0x10B422: main
...
For lists of detected and suppressed errors, rerun with: -s
ERROR SUMMARY: 40 errors from 12 contexts (suppressed: 0 from 0)
That 4 bytes before a block ... sounds a lot like accessing an integer array at index -1.
@gruenich Maybe add valgrind to the Ubuntu workflow?
BTW, the test command line was ./d_test -t "LA" -n 9 -s 2 -l 10000000
, both for valgrind and the debugger in the previous post.
Moving the segfault issue out of the merge request. I opened #134 for this. I hoped we can fix this here, but I don't want to block other improvements in this merge requests.
I propose to merge this.
Open issues:
Until #131 is merged, this based on top of this branch.