ltfat / ltfat

Official development repository of the Large Time Frequency Analysis Toolbox
http://ltfat.org
GNU General Public License v3.0
184 stars 46 forks source link

Segfault after calling frsynmatrix(frame('dgt',[...])) #182

Closed rlaboiss closed 1 year ago

rlaboiss commented 1 year ago

I can reproducibly produce a segmentation fault in octave with a ltfat function:

$ echo "pkg load ltfat; load vars; g = double(g); G=frsynmatrix(frame('dgt',g,a,M),length(g)); (G*G')^(1/2);" | octave-cli
Segmentation fault

The file vars can be obtained by unzipping this file: vars.zip

The segfault happens when evaluation expression (G*G')^(1/2), but it is not caused by the expression itself, but rather by the fact that frsynmatrix was previously called. Here is the proof (no segfault):

$ echo "pkg load ltfat; load vars; g = double(g); G=frsynmatrix(frame('dgt',g,a,M),length(g)); save G G" | octave-cli
$ echo "load G ; (G*G')^(1/2);" | octave-cli

I am using Octave 8.3.0 on a Debian amd64 system with the Debian package for ltfat 2.6.0 installed.

allthatsounds commented 1 year ago

Thanks, this is an interesting one. I can reproduce the segmentation fault on Debian 12 amd64 using Octave 7.3.0.

I can not reproduce it

Hence, I suspect at the moment that the problem

I am still investigating this and would be grateful for any suggestions you may have.

rlaboiss commented 1 year ago

It segfaults for me when I type the commands at the Octave prompt.

allthatsounds commented 1 year ago

In the meanwhile, I could also cause segfaults from within Octave, albeit not reliably. Here is what I have so far:

echo "pkg load ltfat; load vars; g = double(g); G=frsynmatrix(frame('dgt',g,a,M),length(g)); sqrtm(G*G');" | octave-cli

works under the above Debian 12 setup, giving the same result as (G*G')^(1/2) to numeric precision.

The below, however, (basically the code in "frsynmatrix(frame('dft',...)...) " above made explicit, bar the scaling by frac{1}{sqrt(N)} for simplicity) segfaults and uses Octave functions only:

echo "pkg load ltfat;L=160; tmpf=zeros(L,1);tmpf(1)=1;for n=1:L G(:,n)=fft(tmpf,[],1); tmpf=circshift(tmpf,1);end; (G*G')^(1/2)" | octave-cli

Running valgrind on your original segfault-code (please see valgrind_debian.txt below) suggests that openblas queries for unallocated memory during thread initialization (blas_thread_init() ). My knowledge on BLAS-LAPACK is limited, but it seems that this is indeed a difference between Debians liboctave-dev package and Archs octave package. While the former uses openblas 0.3.21, the Arch package uses blas 3.11.0-3 .

The reports also confirm that there are in all likelihood no memory leaks in LTFAT. Personally, I am inclined to report this as an issue to Octave. What do you think? Can you reproduce my results from this comment? Did I overlook anything?

valgrind_arch.txt (for completeness) valgrind_debian.txt (around line 5330)

allthatsounds commented 1 year ago

Okay, I did overlook something:

echo "L=160; tmpf=zeros(L,1);tmpf(1)=1;for n=1:L G(:,n)=fft(tmpf,[],1); tmpf=circshift(tmpf,1);end; (G*G')^(1/2)" | octave-cli

works, so the problem lies in "pkg load ltfat;".

I am not yet clear about the exact cause.

allthatsounds commented 1 year ago

Commit df068a2 fixes the issue for me. Could you please confirm?

rlaboiss commented 1 year ago

Commit df068a2 does not seem to contain any code change, but only change in documentation.

allthatsounds commented 1 year ago

Apologies, it was a series of commits, I just pointed to the last one. 45c4edeeb, 'removed cd from ltfatstart' is the (relevant) starting point.

rlaboiss commented 1 year ago

Indeed, commit 45c4edeeb861dad7a86e9aab38a7cf55a0fdc60c fixes the reported bug, for me. I am still observing other cases of segmentation fault, but feel free to close the present issue.

allthatsounds commented 1 year ago

Thanks! As implied by the commit message, I merely avoided all directory changes during 'pkg load ltfat', because the associated JAVA calls in and by Octave seemed to fail (and to subsequently mess with the environmental setup, which may affect openblas execution, resulting in the segfault - but this is more or less an educated guess on my part). As to why this happens on Debian specifically, I have no idea. Further segfaults may be rooted in similar issues.

As for the occurrence of segfaults before that commit:

Although we can handle this from our side, the problem lies in using 'cd path/to/something/else' . I will report this to Octave.

rlaboiss commented 1 year ago

Thanks for the thorough explanation. Indeed, the problem does not come from ltfat. I hope that it will be fixed by the Octave maintainers.

As regards the Debian package octave-ltfat, which I maintain: during the package building, the script test_all_ltfat.m is exercised. It fails due to a segmentation when calling test_dgt, even though the source is patch with commit 45c4edeeb861dad7a86e9aab38a7cf55a0fdc60c. This is probably caused by the way the unit tests are exercised during the build of the Debian package. I am currently patching the upstream sources, in order to exclude the call to test_dgt.

allthatsounds commented 1 year ago

Let me know if I can be of any help!

Reported upstream: https://savannah.gnu.org/bugs/index.php?64864