Closed rlaboiss closed 1 year ago
Thanks, this is an interesting one. I can reproduce the segmentation fault on Debian 12 amd64 using Octave 7.3.0.
I can not reproduce it
echo "pkg load ltfat; load vars; g = double(g); G=frsynmatrix(frame('dft',M),160); (G*G')^(1/2);" | octave-cli
gives a segmentation fault although there is no LTFAT C-code involved in the processing chain. Here, G is just a rearranged and scaled version of Octave's own fft function output. Derivatives of the above DGT-code produce the expected results, apparently as long as no direct indexing is required. I did not notice any significant differences in the .bashrc of my Debian and Arch. The relevant LTFAT code has not changed since 2018.
Hence, I suspect at the moment that the problem
I am still investigating this and would be grateful for any suggestions you may have.
It segfaults for me when I type the commands at the Octave prompt.
In the meanwhile, I could also cause segfaults from within Octave, albeit not reliably. Here is what I have so far:
echo "pkg load ltfat; load vars; g = double(g); G=frsynmatrix(frame('dgt',g,a,M),length(g)); sqrtm(G*G');" | octave-cli
works under the above Debian 12 setup, giving the same result as (G*G')^(1/2) to numeric precision.
The below, however, (basically the code in "frsynmatrix(frame('dft',...)...) " above made explicit, bar the scaling by frac{1}{sqrt(N)} for simplicity) segfaults and uses Octave functions only:
echo "pkg load ltfat;L=160; tmpf=zeros(L,1);tmpf(1)=1;for n=1:L G(:,n)=fft(tmpf,[],1); tmpf=circshift(tmpf,1);end; (G*G')^(1/2)" | octave-cli
Running valgrind on your original segfault-code (please see valgrind_debian.txt below) suggests that openblas queries for unallocated memory during thread initialization (blas_thread_init() ). My knowledge on BLAS-LAPACK is limited, but it seems that this is indeed a difference between Debians liboctave-dev package and Archs octave package. While the former uses openblas 0.3.21, the Arch package uses blas 3.11.0-3 .
The reports also confirm that there are in all likelihood no memory leaks in LTFAT. Personally, I am inclined to report this as an issue to Octave. What do you think? Can you reproduce my results from this comment? Did I overlook anything?
valgrind_arch.txt (for completeness) valgrind_debian.txt (around line 5330)
Okay, I did overlook something:
echo "L=160; tmpf=zeros(L,1);tmpf(1)=1;for n=1:L G(:,n)=fft(tmpf,[],1); tmpf=circshift(tmpf,1);end; (G*G')^(1/2)" | octave-cli
works, so the problem lies in "pkg load ltfat;".
I am not yet clear about the exact cause.
Commit df068a2 fixes the issue for me. Could you please confirm?
Commit df068a2 does not seem to contain any code change, but only change in documentation.
Apologies, it was a series of commits, I just pointed to the last one. 45c4edeeb, 'removed cd from ltfatstart' is the (relevant) starting point.
Indeed, commit 45c4edeeb861dad7a86e9aab38a7cf55a0fdc60c fixes the reported bug, for me. I am still observing other cases of segmentation fault, but feel free to close the present issue.
Thanks! As implied by the commit message, I merely avoided all directory changes during 'pkg load ltfat', because the associated JAVA calls in and by Octave seemed to fail (and to subsequently mess with the environmental setup, which may affect openblas execution, resulting in the segfault - but this is more or less an educated guess on my part). As to why this happens on Debian specifically, I have no idea. Further segfaults may be rooted in similar issues.
As for the occurrence of segfaults before that commit:
Although we can handle this from our side, the problem lies in using 'cd path/to/something/else' . I will report this to Octave.
Thanks for the thorough explanation. Indeed, the problem does not come from ltfat. I hope that it will be fixed by the Octave maintainers.
As regards the Debian package octave-ltfat, which I maintain: during the package building, the script test_all_ltfat.m is exercised. It fails due to a segmentation when calling test_dgt
, even though the source is patch with commit 45c4edeeb861dad7a86e9aab38a7cf55a0fdc60c. This is probably caused by the way the unit tests are exercised during the build of the Debian package. I am currently patching the upstream sources, in order to exclude the call to test_dgt
.
Let me know if I can be of any help!
Reported upstream: https://savannah.gnu.org/bugs/index.php?64864
I can reproducibly produce a segmentation fault in octave with a ltfat function:
The file vars can be obtained by unzipping this file: vars.zip
The segfault happens when evaluation expression
(G*G')^(1/2)
, but it is not caused by the expression itself, but rather by the fact thatfrsynmatrix
was previously called. Here is the proof (no segfault):I am using Octave 8.3.0 on a Debian amd64 system with the Debian package for ltfat 2.6.0 installed.