zhaokg / Rbeast

Bayesian Change-Point Detection and Time Series Decomposition
262 stars 39 forks source link

beast123 + MATLAB crashing ... Linux #13

Closed qlahcim closed 11 months ago

qlahcim commented 1 year ago

Ubuntu Linux 22.04 LTS + MATLAB R2023a upd3

>> rbeast_version

rbeastGitHubVersion =

    0.9461

Example:

      load('imageStack.mat') 
       % A toy example of stacked time series images: unevely-spaced in time
      NDVI3D=imageStack.ndvi    % a 12x9x1066 3D cube
      TIME  =imageStack.datestr % 1066 is the time series length%
      metadata=[];      
      metadata.time=[];
      metadata.time.dateStr=TIME
      metadata.time.strFmt='LT05_018032_20110726.yyyy-mm-dd';
      metadata.deltaTime  =1/12; % aggregated at a monthly interval
      metadata.period     =1.0;  % the period is 1.0 (year)
      extra=[];
      extra.dumpInputData   =true % get a copy of the aggregated input
      extra.numThreadsPerCPU = 2; % 2 threads per CPU core
      o=beast123(NDVI3D,metadata,[],[], extra) 
      imagesc(o.sig2)
      imagesc(o.trend.ncpPr(:,:,1:3))
      printbeast(o,[2,4]) %print the result at row 2 and col 4     
      plotbeast(o,'index',[2,4]) %plot the result at row 2 and col 4  

produce always MATLAB Crash:

MATLAB Log File: /home/kva/matlab_crash_dump.1020259-1


MATLAB Log File



      Segmentation violation detected at 2023-07-10 15:34:47 +0200

Configuration: Crash Decoding : Disabled - No sandbox or build area path Crash Mode : continue (default) Default Encoding : UTF-8 Deployed : false Desktop Environment : X-Cinnamon GNU C Library : 2.35 stable Graphics Driver : Uninitialized hardware Graphics card 1 : 0x10de ( 0x10de ) 0x1cb1 Version 535.54.3.0 (0-0-0) Graphics card 2 : Not Started 0x8086 ( 0x8086 ) 0x3e98 Version 0.0.0.0 (0-0-0) Java Version : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode MATLAB Architecture : glnxa64 MATLAB Entitlement ID : 6257193 MATLAB Root : /opt/MATLAB/R2023a MATLAB Version : 9.14.0.2286388 (R2023a) Update 3 OpenGL : hardware Operating System : Linux Mint 21.1 Process ID : 1020259 Processor ID : x86 Family 6 Model 158 Stepping 13, GenuineIntel Session Key : d5cc3121-ac3a-4dd2-bc38-dce0840e29fb Window System : The X.Org Foundation (12101004), display :0

Fault Count: 1

Abnormal termination: Segmentation violation

Current Thread: 'MCR 0 interpret' id 140544990295616

Register State (from fault): RAX = 0000000000000000 RBX = 00007fd363c62c90 RCX = 0000000000000000 RDX = 0000000000000000 RSP = 00007fd32e3870a0 RBP = 00007fd32e387680 RSI = 00007fd32e3871e0 RDI = 00007fd0be3f4640

R8 = 0000000000000001 R9 = 00007fd1e0823000 R10 = 00007fd1f8c6ca00 R11 = 4272b9ef08d57889 R12 = 0000000000000010 R13 = 00007fd363e6cc80 R14 = 000000000000001f R15 = 00007fd32e3871e0

RIP = 00007fd36c89659b EFL = 0000000000010202

CS = 0033 FS = 0000 GS = 0000

Stack Trace (from fault): [ 0] 0x00007fd36c89659b /lib/x86_64-linux-gnu/libc.so.6+00615835 [ 1] 0x00007fd363c5a8be /home/kva/tmp/BEAST/work/Rbeast.mexa64+00370878 [ 2] 0x00007fd363c5ae2e /home/kva/tmp/BEAST/work/Rbeast.mexa64+00372270 mexFunction+00000014 [ 3] 0x00007fd3623d8f5f /opt/MATLAB/R2023a/bin/glnxa64/libmex.so+00954207 [ 4] 0x00007fd3623d8fd7 /opt/MATLAB/R2023a/bin/glnxa64/libmex.so+00954327 [ 5] 0x00007fd3623d9047 /opt/MATLAB/R2023a/bin/glnxa64/libmex.so+00954439 [ 6] 0x00007fd3623da5fa /opt/MATLAB/R2023a/bin/glnxa64/libmex.so+00959994 [ 7] 0x00007fd3623c51b0 /opt/MATLAB/R2023a/bin/glnxa64/libmex.so+00872880 [ 8] 0x00007fd362bf13dd /opt/MATLAB/R2023a/bin/glnxa64/libmwm_dispatcher.so+01528797 _ZN8Mfh_file20dispatch_file_commonEMS_FviPP11mxArray_tagiS2_EiS2iS2+00000173 [ 9] 0x00007fd362bf24cc /opt/MATLAB/R2023a/bin/glnxa64/libmwm_dispatcher.so+01533132 [ 10] 0x00007fd362bf28a1 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_dispatcher.so+01534113 _ZN8Mfh_file8dispatchEiPSt10unique_ptrI11mxArray_tagN6matrix6detail17mxDestroydeleterEEiPPS1+00000033 [ 11] 0x00007fd361e5d95e /opt/MATLAB/R2023a/bin/glnxa64/libmwlxemainservices.so+02480478 [ 12] 0x00007fd361e5dba6 /opt/MATLAB/R2023a/bin/glnxa64/libmwlxemainservices.so+02481062 [ 13] 0x00007fd35100ed04 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+10546436 [ 14] 0x00007fd3510018e0 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+10492128 [ 15] 0x00007fd350f9d692 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+10081938 [ 16] 0x00007fd350ced250 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+07262800 [ 17] 0x00007fd350cef544 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+07271748 [ 18] 0x00007fd350ceca89 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+07260809 [ 19] 0x00007fd350cfe3ff /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+07332863 [ 20] 0x00007fd350cfee79 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+07335545 [ 21] 0x00007fd350cec8a4 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+07260324 [ 22] 0x00007fd350cec996 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+07260566 [ 23] 0x00007fd350e2961c /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+08558108 [ 24] 0x00007fd350e2d6b1 /opt/MATLAB/R2023a/bin/glnxa64/libmwm_lxe.so+08574641 [ 25] 0x00007fd361fdb1c8 /opt/MATLAB/R2023a/bin/glnxa64/libmwlxemainservices.so+04043208 [ 26] 0x00007fd361ebd4ef /opt/MATLAB/R2023a/bin/glnxa64/libmwlxemainservices.so+02872559 [ 27] 0x00007fd361ec4567 /opt/MATLAB/R2023a/bin/glnxa64/libmwlxemainservices.so+02901351 [ 28] 0x00007fd361f826d5 /opt/MATLAB/R2023a/bin/glnxa64/libmwlxemainservices.so+03679957 [ 29] 0x00007fd361f82b3e /opt/MATLAB/R2023a/bin/glnxa64/libmwlxemainservices.so+03681086 [ 30] 0x00007fd362ed7d12 /opt/MATLAB/R2023a/bin/glnxa64/libmwiqm.so+01019154 _ZN3iqm14UserEvalPlugin7executeEP15inWorkSpace_tag+00000754 [ 31] 0x00007fd26a474f4c /opt/MATLAB/R2023a/bin/glnxa64/libnativejmi.so+01011532 _ZN9nativejmi17JmiUserEvalPlugin7executeEP15inWorkSpace_tag+00000028 [ 32] 0x00007fd362eb1bd0 /opt/MATLAB/R2023a/bin/glnxa64/libmwiqm.so+00863184 [ 33] 0x00007fd362e7d64b /opt/MATLAB/R2023a/bin/glnxa64/libmwiqm.so+00648779 [ 34] 0x00007fd3624ad831 /opt/MATLAB/R2023a/bin/glnxa64/libmwbridge.so+00497713 [ 35] 0x00007fd3624adc43 /opt/MATLAB/R2023a/bin/glnxa64/libmwbridge.so+00498755 [ 36] 0x00007fd3624c90c2 /opt/MATLAB/R2023a/bin/glnxa64/libmwbridge.so+00610498 _Z22mnGetCommandLineBufferbRbN7mwboost8optionalIKP15inWorkSpace_tagEEbRKNS0_9function2IN6mlutil14cmddistributor17inExecutionStatusERKNSt7__cxx1112basic_stringIDsSt11char_traitsIDsESaIDsEEES4_EE+00000210 [ 37] 0x00007fd3624c9303 /opt/MATLAB/R2023a/bin/glnxa64/libmwbridge.so+00611075 _Z8mnParserv+00000435 [ 38] 0x00007fd362d4d1bf /opt/MATLAB/R2023a/bin/glnxa64/libmwmcr.so+00889279 [ 39] 0x00007fd36d935234 /opt/MATLAB/R2023a/bin/glnxa64/libmwmvm.so+03363380 _ZN14cmddistributor15PackagedTaskIIP10invokeFuncIN7mwboost8functionIFvvEEEEENS2_10shared_ptrINS2_6futureIDTclfpEEEEEERKT+00000068 [ 40] 0x00007fd36d9354e9 /opt/MATLAB/R2023a/bin/glnxa64/libmwmvm.so+03364073 _ZNSt17_Function_handlerIFN7mwboost3anyEvEZN14cmddistributor15PackagedTaskIIP10createFuncINS0_8functionIFvvEEEEESt8functionIS2_ET_EUlvE_E9_M_invokeERKSt9_Any_data+00000025 [ 41] 0x00007fd362ed1d8d /opt/MATLAB/R2023a/bin/glnxa64/libmwiqm.so+00994701 _ZN3iqm18PackagedTaskPlugin7executeEP15inWorkSpace_tag+00000093 [ 42] 0x00007fd362d48765 /opt/MATLAB/R2023a/bin/glnxa64/libmwmcr.so+00870245 [ 43] 0x00007fd362eb1bd0 /opt/MATLAB/R2023a/bin/glnxa64/libmwiqm.so+00863184 [ 44] 0x00007fd362e7bab2 /opt/MATLAB/R2023a/bin/glnxa64/libmwiqm.so+00641714 [ 45] 0x00007fd362e7c403 /opt/MATLAB/R2023a/bin/glnxa64/libmwiqm.so+00644099 [ 46] 0x00007fd362d37d9e /opt/MATLAB/R2023a/bin/glnxa64/libmwmcr.so+00802206 [ 47] 0x00007fd362d37995 /opt/MATLAB/R2023a/bin/glnxa64/libmwmcr.so+00801173 [ 48] 0x00007fd362d37bed /opt/MATLAB/R2023a/bin/glnxa64/libmwmcr.so+00801773 [ 49] 0x00007fd36cee0277 /opt/MATLAB/R2023a/bin/glnxa64/libmwboost_thread.so.1.78.0+00045687 [ 50] 0x00007fd36c894b43 /lib/x86_64-linux-gnu/libc.so.6+00609091 [ 51] 0x00007fd36c926a00 /lib/x86_64-linux-gnu/libc.so.6+01206784

This error was detected while a MEX-file was running. If the MEX-file is not an official MathWorks function, please examine its source code for errors. Please consult the External Interfaces Guide for information on debugging MEX-files.

zhaokg commented 1 year ago

Thanks a lot. Again, I changed the code a bit in order to get an Octave version. When I test-ran it on my Linux machine, there wasn't any errors. I will give a deep look and see what could be possibly wrong. Thanks again for the important feedback.

zhaokg commented 1 year ago

Given that the crash error refers to memory locations only, it is hard for me to decipher where exactly it could go wrong on your end. Here are a few possibilities: (1) My Rbeast.mexa64 was compiled under RedHat, which may not work well on Ubuntun; (2) The "Segmentation violation" hints that there was a corrupted memory pointer; it is tough to tell whether it is due to my program logic or the system setting., and (3) I forgot to include the "-lpthread" flag when compiling the mex file; not sure if this matters.

Regardless, I re-compiled the mex file with the "-lpthread" flag (to support multi-threading). Not sure if it will be working on your end. If not working, here is another thing I prepared for you:

I uploaded a "rbeast_mex_compile.m" script. You can get it by running the eval(webread('http://b.link/rbeast',weboptions('cert',''))). and run rbeast_mex_compile to download all the C source code files from Github to your local holder and compile for the mex library on your own machine. The script should generate a "Rbeast.mexa64" file if successful.

qlahcim commented 1 year ago

Still does not work on my system... see system.txt

and here is attached rbeast_mex_compile log file ... see rbeast_mex_compile.txt

qlahcim commented 1 year ago

See attached link with video: https://drive.google.com/file/d/12eC0n-QqLTPLv56vc0_DGszSdPzv6-r1/view?usp=drivesdk I hope, it may be useful, too.

zhaokg commented 1 year ago

Dear Michal,

Once again, really appreciate all your feedbacks and info. For the compilation failure, the problem is very clear:

abc_ide_util.c:(.text+0x410): multiple definition of `RemoveField'; /tmp/mex_1905550192430657_1189581/abc_ide_util_common.o:abc_ide_util_common.c:(.text+0x6c0): first defined here
/usr/bin/ld: /tmp/mex_1905550192430657_1189581/abc_ide_util.o: in function `CheckInterrupt':

abc_ide_util.c and abc_ide_util_common.c are essentially the same files. The rbeast_mex_compile.m script will download only abc_ide_util_common.c. But if you have manually downloaded from the Source folder, the two would generate a conflict. My bad and apology for this oversight. Now I have completely removed the old file abc_ide_util.c from the Github Source folder. If you clear up your local src folder and re-download the source files, I believe the "rbeast_mex_compile.m" should pass the linking stage (Fingers crossed that no new problem appears).

zhaokg commented 1 year ago

For the crashing, I really appreciate the video (thanks for the effort to prepare it). Again, seems like that it was an issue due to mulithreading (Debugging the multithreading issue was extremely tedious and difficult). The majority of Matlab's C API is not thread-safe; I believe I had already taken care of it well and cant' really think of any other things on the top of my head that can cause the problem.

I used the same C code for Matlab, Python, and R, tailored to all typical OS platforms, CPUs, and compilers as well as different software versions (e.g., Python versions, numpy versions, R versions, ...). That is why the BEAST algorithm is relatively simple but my implementation is pretty complex.

By default, when handling many time series concurrently, the program fires up a total of number CPU cores 2 threads. Your CPU has 16 logical cores, so it spawns 162=32 threads by default. Another possible reason for the crash is the overlapping of each thread's stack space. As a quick test, can you please do not run that many threads and instead use a customized samller number of threads?

Use extra.numThreadsPerCPU = 1 to have only 1 thread per CPU core (the default is 2 in my program) Use extra. numParThreads = a small number(e.g., 5) to limit the total number of concurrent threads


        load('imageStack.mat') 
        % A toy example of stacked time series images: unevely-spaced in time
       NDVI3D=imageStack.ndvi    % a 12x9x1066 3D cube
       TIME  =imageStack.datestr % 1066 is the time series length%
       metadata=[];      
       metadata.time=[];
       metadata.time.dateStr=TIME
       metadata.time.strFmt='LT05_018032_20110726.yyyy-mm-dd';
       metadata.deltaTime  =1/12; % aggregated at a monthly interval
       metadata.period     =1.0;  % the period is 1.0 (year)
       extra=[];
       extra.dumpInputData   =true % get a copy of the aggregated input
       extra.numThreadsPerCPU = 1; % 1 threads per CPU core
      extra.numParThreads         = 5; % at most 5 concurrent threads are used
       o=beast123(NDVI3D,metadata,[],[], extra) 
``
Again, sorry for not being able to tell the problem immediately. I tried it very hard on my end but couldn't reproduce the crash problem.
qlahcim commented 1 year ago

Thanks for quick response. I will chek out your recommendations tomorrow and let you know about my progress and results.

I am afraid that the problem is connected with glibc library version. I met this problem some time before and it was connected with older version glibc. After upgrade to the higher version problem disappeared.

You mentioned that you are using Redhat linux. Could you compare your glibc lib version with current version in Ubuntu 22.04.2 or Linux Mint 21.1?

qlahcim commented 1 year ago

Just an additional question: Will be possible that you use for development and testing more user friendly linux distribution based on Debian or Ubuntu Linux. I propose use virtualized ( via VirtualBox, for example) Linux Mint 21.1 or Ubuntu 22.04 as an example of two most used debian based linux Distributions. I understand the licensing problems with additional Matlab installation, but this is probably the only way how to handle this very specific problem.

zhaokg commented 1 year ago

Miachal, Thanks a lot for the suggestion. I have Ubuntu22.04 installed on a VM and tested the current version of the Rbeast.mexa64 (the one compiled under the RedHat): it worked without any issues. Again, I wouldn't be able to encounter the problem you had.

zhaokg commented 1 year ago

Maichal, The mex file is just a shared so library. Here is the dependencies information:

Output from readelf -d Rbeast.mexa64

Dynamic section at offset 0x6bd60 contains 35 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libmwservices.so]
 0x0000000000000001 (NEEDED)             Shared library: [libut.so]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libmx.so]
 0x0000000000000001 (NEEDED)             Shared library: [libmex.so]
 0x0000000000000001 (NEEDED)             Shared library: [libmat.so]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

output from ldd --verbose Rbeast.mexa64:

    linux-vdso.so.1 (0x00007ffc24fc0000)
    libmwservices.so => not found
    libut.so => not found
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6c5e50f000)
    libmx.so => not found
    libmex.so => not found
    libmat.so => not found
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6c5e119000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6c5de00000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6c5da00000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f6c5e525000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6c5e4ed000)

    Version information:
    ./Rbeast.mexa64:
        libpthread.so.0 (GLIBC_2.3.2) => /lib/x86_64-linux-gnu/libpthread.so.0
        libpthread.so.0 (GLIBC_2.3.4) => /lib/x86_64-linux-gnu/libpthread.so.0
        libpthread.so.0 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libpthread.so.0
        libc.so.6 (GLIBC_2.17) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
        libm.so.6 (GLIBC_2.27) => /lib/x86_64-linux-gnu/libm.so.6
        libm.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libm.so.6
    /lib/x86_64-linux-gnu/libpthread.so.0:
        libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
    /lib/x86_64-linux-gnu/libm.so.6:
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
        libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_PRIVATE) => /lib/x86_64-linux-gnu/libc.so.6
    /lib/x86_64-linux-gnu/libstdc++.so.6:
        libm.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libm.so.6
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        libgcc_s.so.1 (GCC_4.2.0) => /lib/x86_64-linux-gnu/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.4) => /lib/x86_64-linux-gnu/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.3) => /lib/x86_64-linux-gnu/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.0) => /lib/x86_64-linux-gnu/libgcc_s.so.1
        libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.6) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.33) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.25) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.18) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.16) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.32) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.7) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.3.4) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.17) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.3) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.34) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
    /lib/x86_64-linux-gnu/libc.so.6:
        ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
    /lib/x86_64-linux-gnu/libgcc_s.so.1:
        libc.so.6 (GLIBC_2.35) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.34) => /lib/x86_64-linux-gnu/libc.so.6
zhaokg commented 1 year ago

Another question: can I know your purpose of using the BEAST model? If it is just for exploring, I am happy to test-run on your data to see the relveance. If your purpose is to use it for some real data analysis, I am also happy to run it on my end for you: That is probably quicker than trying to pintpoint the exact crash error, which seems to be an impossible task unless I can debug on your exact machine. (Also, I am good at programming but not that good to be a professional software developer, given that my daily job is a professor in environment sciences). Regardless, if you provide more info, I am defintely happy to conintune exporing the solution.

qlahcim commented 1 year ago

By default, when handling many time series concurrently, the program fires up a total of number CPU cores _2 threads. Your CPU has 16 logical cores, so it spawns 16_2=32 threads by default. Another possible reason for the crash is the overlapping of each thread's stack space. As a quick test, can you please do not run that many threads and instead use a customized samller number of threads?

Use extra.numThreadsPerCPU = 1 to have only 1 thread per CPU core (the default is 2 in my program) Use extra. numParThreads = a small number(e.g., 5) to limit the total number of concurrent threads

        load('imageStack.mat') 
        % A toy example of stacked time series images: unevely-spaced in time
       NDVI3D=imageStack.ndvi    % a 12x9x1066 3D cube
       TIME  =imageStack.datestr % 1066 is the time series length%
       metadata=[];      
       metadata.time=[];
       metadata.time.dateStr=TIME
       metadata.time.strFmt='LT05_018032_20110726.yyyy-mm-dd';
       metadata.deltaTime  =1/12; % aggregated at a monthly interval
       metadata.period     =1.0;  % the period is 1.0 (year)
       extra=[];
       extra.dumpInputData   =true % get a copy of the aggregated input
       extra.numThreadsPerCPU = 1; % 1 threads per CPU core
      extra.numParThreads         = 5; % at most 5 concurrent threads are used
       o=beast123(NDVI3D,metadata,[],[], extra) 
``
Again, sorry for not being able to tell the problem immediately. I tried it very hard on my end but couldn't reproduce the crash problem.

Good news: the option extra.numThreadsPerCPU = 1; % 1 threads per CPU core solve the problem, I can then use any suitable number of threads from 1 to 16 via extra.numParThreads = 5; % at most 5 concurrent threads are used

But, when I choose extra.numParThreads > 16 MATLAB crash again!!!

So the problem is definitely connected with actual total number of threads.

Could you find some proper fix of your C++ code to make Rbeast more stable and robust?

qlahcim commented 1 year ago

Miachal, Thanks a lot for the suggestion. I have Ubuntu22.04 installed on a VM and tested the current version of the Rbeast.mexa64 (the one compiled under the RedHat): it worked without any issues. Again, I wouldn't be able to encounter the problem you had.

What MATLAB version you use? And on what exact Linux distribution versions? I am afraid that latest versions of MATLAB change something crucial under the hood regarding multiThread processing, But I do not know if it is relevant for external MEX files, too.

qlahcim commented 1 year ago

Another question: can I know your purpose of using the BEAST model? If it is just for exploring, I am happy to test-run on your data to see the relveance. If your purpose is to use it for some real data analysis, I am also happy to run it on my end for you: That is probably quicker than trying to pintpoint the exact crash error, which seems to be an impossible task unless I can debug on your exact machine. (Also, I am good at programming but not that good to be a professional software developer, given that my daily job is a professor in environment sciences). Regardless, if you provide more info, I am defintely happy to conintune exporing the solution.

I would like to apply the BEAST MATLAB package to routine processing of 3-D neutron-flux measurements measured signals at large energetic nuclear reactors. These signal has 1sec sample period. Processing should be applied periodically each 1-6hours 300 signals ... 3600 x 300 data array each hour. The signals are sometimes very complicated due to the irregular behavior (outliers, spurious outages (NaNs), strongly non-gaussian noise produced several non-linear measurement artifacts or by discretization error of A/D converters and sensors working at extremely non-friendly environment).

So finally, at this stage, I would like to be more familiar with your code applied to my "very specific" time series.

zhaokg commented 1 year ago

By default, when handling many time series concurrently, the program fires up a total of number CPU cores _2 threads. Your CPU has 16 logical cores, so it spawns 16_2=32 threads by default. Another possible reason for the crash is the overlapping of each thread's stack space. As a quick test, can you please do not run that many threads and instead use a customized samller number of threads? Use extra.numThreadsPerCPU = 1 to have only 1 thread per CPU core (the default is 2 in my program) Use extra. numParThreads = a small number(e.g., 5) to limit the total number of concurrent threads

        load('imageStack.mat') 
        % A toy example of stacked time series images: unevely-spaced in time
       NDVI3D=imageStack.ndvi    % a 12x9x1066 3D cube
       TIME  =imageStack.datestr % 1066 is the time series length%
       metadata=[];      
       metadata.time=[];
       metadata.time.dateStr=TIME
       metadata.time.strFmt='LT05_018032_20110726.yyyy-mm-dd';
       metadata.deltaTime  =1/12; % aggregated at a monthly interval
       metadata.period     =1.0;  % the period is 1.0 (year)
       extra=[];
       extra.dumpInputData   =true % get a copy of the aggregated input
       extra.numThreadsPerCPU = 1; % 1 threads per CPU core
      extra.numParThreads         = 5; % at most 5 concurrent threads are used
       o=beast123(NDVI3D,metadata,[],[], extra) 
``
Again, sorry for not being able to tell the problem immediately. I tried it very hard on my end but couldn't reproduce the crash problem.

Good news: the option extra.numThreadsPerCPU = 1; % 1 threads per CPU core solve the problem, I can then use any suitable number of threads from 1 to 16 via extra.numParThreads = 5; % at most 5 concurrent threads are used

But, when I choose extra.numParThreads > 16 MATLAB crash again!!!

So the problem is definitely connected with actual total number of threads.

Could you find some proper fix of your C++ code to make Rbeast more stable and robust?

Again, it is hard to debug, given that I can't produce the error on my end. From the error information I got so far, it was unlikely to be a direct problem with Rbeast.mex; the error comes from an external library. Also, setting CPU affinity is an extremely tricky business. In my current implementation, I wrote pthread myself for Windows based on Win32 API; and the CPU affinity can be managed well (I mean, specify creating a thread for a chosen cpu core). On LInux, I used the default CPU affinity API; for macOS, the pthread library API differ quite a lit for the thread affinity assignment; then for Solaris, it differs again...

The crash happened at

[ 49] 0x00007fd36cee0277 /opt/MATLAB/R2023a/bin/glnxa64/libmwboost_thread.so.1.78.0+00045687
[ 50] 0x00007fd36c894b43 /lib/x86_64-linux-gnu/libc.so.6+00609091
[ 51] 0x00007fd36c926a00 /lib/x86_64-linux-gnu/libc.so.6+01206784

Again, to me, this is an issue with Matlab library. See the following reports from https://www.mathworks.com/matlabcentral/answers/1893615-matlab-crashing-on-ubuntu-22-04-help https://www.mathworks.com/matlabcentral/answers/1763065-rosbagwriter-crashes-on-creation?s_tid=prof_contriblnk

On the first post, don't bindly believe what the Matlab staff commented about the bad instruction. GIven the similar errors reported and observed behavior, my best bet is that this crash is a bug with Matlab itself and has no direct relationships with Rbeast.mex. To fix th

qlahcim commented 1 year ago

Ok... Thanks for info! I appreciated your effort.

It is really looks like a MATLAB problem. But what is really strange is the fact, that you are not able to reproduce this problem on the similar linux distribution with the same version of Matlab?! A year ago I faced to the very similar problem with other external mex file and so far is not clear what is the problem. The crash appears in this case only on specific combinations of Linux and matlab, which is really terrible. And MATLAB support just claim that problem is definitely in external mex code.

The most fatal errors are unreproducible errors, where is absolutely not clear what is the real cause of problem.

On this stage I will proceed with Beast testing on my data with proper BEAST thread settings, which guarantee proper functionality. But I am not sure when the problem strikes again. In this situation I am afraid to use Beast on routine processing basis. I can not swith to the R or Python, because Matlab is my main development tool for many years...

zhaokg commented 1 year ago

Thanks Machal. Another reason why I think that this is a Matlab bug is that my C beast code is the SAME for R, Python, Octave, and Matlab (For each lang, my code supports at least three OS platforms; in the case of R, it supports dozens of OS platforms-- R has a very strict policy in terms of uploading the binary code to its CRAN). Your case was the first time that it crashed.

Because I put quite of energy into the mulithreading part, I know how tricky it could be because it involves lots of computer architecture and code with assembly language . Even the best developer may not find it easy to handle. Based on my experience, it is IMPOSSIBLE for Matlab's team to pinpoint the bug if they can't reproduce the error on their end.

zhaokg commented 1 year ago

As another option, if you think BEAST is really needed for your project, I am happy to give a further look and see what could be going wrong with the matlab's library: one route is for you to create an VM image and share with me so I can reproduce the error on my end. (Again, if we are lucky enough to figure out the reason, there is no way that we can fix it, given that it is on the Matlab's end.

My best guess is that that there is a buggy if-else branch in their code to handle the thread affinity: Basically, when you have a total of 16 threads (1 main thread +numParThreads=15), it runs fine on your system. But if you got one more (1main thread + numbParthreads=16), it crashed. And 16 is the number of CPU cores you have on the system.

qlahcim commented 1 year ago

My best guess is that that there is a buggy if-else branch in their code to handle the thread affinity: Basically, when you have a total of 16 threads (1 main thread +numParThreads=15), it runs fine on your system. But if you got one more (1main thread + numbParthreads=16), it crashed. And 16 is the number of CPU cores you have on the system.

Situation is slightly different: I am able to run beast123 for extra.numThreadsPerCPU = 1; % 1 threads per CPU core extra.numParThreads = 1-16; % at most 1-16 concurrent threads are used

Program works for any choice extra.numParThreads = 1,2, ... , 16

Problem starts for extra.numParThreads = 17 when beast123 reports:

WARNING: metadata$season is either missing or not given as a valid specifier string (e.g., none, harmonic, or
     dummy). A default season='harmonic' is assumed. 

INFO: To supress printing the parameers in beast123(),   set extra.printOptions = 0  
INFO: To supress printing the parameers in beast(),      set print.options = 0 
INFO: To supress printing the parameers in beast.irreg(),set print.options = 0 
INFO: To supress warning messages in beast123(),         set extra.quiet = 1  
INFO: To supress warning messages in beast(),            set quiet = 1 
INFO: To supress warning messages in beast.irreg(),      set quiet = 1 

%--------------------------------------------------%
%       Brief summary of Input Data                %
%--------------------------------------------------%
Data Dimension: [12x9x1066] - 108 signals of length 1066 each
IsOrdered     : No, unordered in time, to be sorted/ordered before running BEAST
IsRegular     : No, unevenly spaced at avg interval of  0.0352235 year = 0.422681 months = 12.8566 days
Preprocessing : Aggregate irregular data into a regular interval of 0.0833333 year = 1 months = 30.4167 days
HasSeasonCmpnt: true  | period = 1 year = 12 months = 365 days. The model 'Y=Trend+Season+Error' is fitted.
              : Num_of_DataPoints_per_Period = period/deltaTime = 1/0.0833333 = 12
HasOutlierCmpt: false | If true, Y=Trend+Season+Outlier+Error fitted instead of Y=Trend+Season+Error
Deseasonalize : false | If true, remove a global seasonal  cmpnt before running BEAST & add it back after BEAST
Detrend       : false | If true, remove a global trend component before running BEAST & add it back after BEAST
MissingValue  : NaN  flagged as missing values 
MaxMissingRate: if more than 75% of data is missing, BEAST will skip it.

%--------------------------------------------------%
%      OPTIONS used in the MCMC inference          %
%--------------------------------------------------%

%......Start of displaying 'MetaData' ......
metadata                = []         % metadata is used to interpret the input data
metadata.season         = 'harmonic' % fit a harmonic model to the periodic component
metadata.startTime      = 1984.25    % 1984-04-01
metadata.deltaTime      = 0.0833333  % 0.0833333 year(s) = 1 month(s) = 30.4167 day(s)
metadata.period         = 1          % 1 year(s) = 12 month(s) = 365 day(s) 
metadata.whichDimIsTime = 3
metadata.maxMissingRate = 0.75       % if more than 75% of data is missing, BEAST will skip it.
metadata.deseasonalize  = false      % If true,remove a global seasonal cmpnt before running BEAST & add it back later
metadata.detrend        = false      % If true,remove a global trend  cmpnt before running BEAST & add it back later
%........End of displaying MetaData ........

%......Start of displaying 'prior' ......
prior                   = []         % prior is the key model parameters of BEAST
prior.seasonMinOrder    = 1          % sorder.minmax[1]: min harmonic order alllowed
prior.seasonMaxOrder    = 5          % sorder.minmax[2]: max harmonic order alllowed
prior.seasonMinKnotNum  = 0          % scp.minmax[1]   : min num of seasonal chngpts
prior.seasonMaxKnotNum  = 5          % scp.minmax[2]   : max num of seasonal chngpts
prior.seasonMinSepDist  = 6          % sseg.min        : min seasonal segment length in terms of datapoints
prior.trendMinOrder     = 0          % torder.minmax[1]: min trend polynomial order alllowed
prior.trendMaxOrder     = 1          % torder.minmax[2]: max trend polynomial order alllowed
prior.trendMinKnotNum   = 0          % tcp.minmax[1]   : min num of chngpts in trend
prior.trendMaxKnotNum   = 10         % tcp.minmax[2]   : min num of chngpts in trend
prior.trendMinSepDist   = 6          % tseg.min        : min trend segment length in terms of datapoints
prior.K_MAX             = 82         % max number of terms in general linear model (useful only at small values)
prior.precValue         = 1.5        % useful mainly when precPriorType='constant'
prior.modelPriorType    = 1         
prior.precPriorType     = 'uniform'
%......End of displaying prior ......

%......Start of displaying 'mcmc' ......
mcmc                           = []         % mcmc is not BEAST parameters but MCMC sampler options
mcmc.seed                      = 0          % A nonzero seed to replicate among runs
mcmc.samples                   = 3000       % Number of samples saved per chain: the larger, the better
mcmc.thinningFactor            = 1          % Thinning the chain: the larger, the better 
mcmc.burnin                    = 150        % Number of inital samples discarded: the larger, the better
mcmc.chainNumber               = 3          % Nunber of chains: the larger, the better
mcmc.maxMoveStepSize           = 12         % Max step of jumping from current changepoint: No need to change
mcmc.trendResamplingOrderProb  = 0.1        % Proposal probability of sampling trend polynominal order 
mcmc.seasonResamplingOrderProb = 0.17       % Proposal probability of sampling seasoanl order 
mcmc.credIntervalAlphaLevel    = 0.95       % The alphal level for Credible Intervals
% Total number of models randomly visited is (burnin+sampples*thinFactor)*chainNumber=9450
%......End of displaying mcmc ......

%......Start of displaying 'extra' ......
extra                      = []         % extra is used to configure output/computing options
extra.dumpInputData        = true
extra.whichOutputDimIsTime = 3
extra.computeCredible      = false
extra.fastCIComputation    = true
extra.computeSeasonOrder   = false
extra.computeTrendOrder    = false
extra.computeSeasonChngpt  = true
extra.computeTrendChngpt   = true
extra.computeSeasonAmp     = false
extra.computeTrendSlope    = false
extra.tallyPosNegSeasonJump= false
extra.tallyPosNegTrendJump = false
extra.tallyIncDecTrendJump = false
extra.printProgressBar     = true
extra.printOptions         = true
extra.consoleWidth         = 110
extra.numThreadsPerCPU     = 1
extra.numParThreads        = 17
%......End of displaying extra ......

Parallel computing: thread#1  generated ... 
Parallel computing: thread#2  generated ... 
Parallel computing: thread#3  generated ... 
Parallel computing: thread#4  generated ... 
Parallel computing: thread#5  generated ... 
Parallel computing: thread#6  generated ... 
Parallel computing: thread#7  generated ... 
Parallel computing: thread#8  generated ... 
Parallel computing: thread#9  generated ... 
Parallel computing: thread#10 generated ... 
Parallel computing: thread#11 generated ... 
Parallel computing: thread#12 generated ... 
Parallel computing: thread#13 generated ... 
Parallel computing: thread#14 generated ... 
Parallel computing: thread#15 generated ... 
Parallel computing: thread#16 generated ... 
Parallel computing: thread#17 generated ... 
Rbeast: Waiting on 17 threads...
Press and hold CTR+C to interrupt and quit while running.
 100.0%done<Remaining00hrs00min00sec>[=======================================================================]

Finalizing ... 
Rbeast: Thread #0  finished ... 
Rbeast: Thread #1  finished ... 
Rbeast: Thread #2  finished ... 
Rbeast: Thread #3  finished ... 
Rbeast: Thread #4  finished ... 
Rbeast: Thread #5  finished ... 
Rbeast: Thread #6  finished ... 
Rbeast: Thread #7  finished ... 
Rbeast: Thread #8  finished ... 
Rbeast: Thread #9  finished ... 
Rbeast: Thread #10 finished ... 
Rbeast: Thread #11 finished ... 
Rbeast: Thread #12 finished ... 
Rbeast: Thread #13 finished ... 
Rbeast: Thread #14 finished ... 
Rbeast: Thread #15 finished ... 

There is one interesting line: Rbeast: Waiting on 17 threads...

And then MATAB crash ...

To me, situation looks like, that Rbeast mex file has the problem (of course, on my system only) with situation when asking for more than 16 (logical not physical CPUs) available Threads. My computer has only 8 physical cores + multiThreading ON (2 threads per each physical core!!!) Could be this some clue for you?

zhaokg commented 1 year ago

My best guess is that that there is a buggy if-else branch in their code to handle the thread affinity: Basically, when you have a total of 16 threads (1 main thread +numParThreads=15), it runs fine on your system. But if you got one more (1main thread + numbParthreads=16), it crashed. And 16 is the number of CPU cores you have on the system.

Situation is slightly different: I am able to run beast123 for extra.numThreadsPerCPU = 1; % 1 threads per CPU core extra.numParThreads = 1-16; % at most 1-16 concurrent threads are used

Program works for any choice extra.numParThreads = 1,2, ... , 16

Problem starts for extra.numParThreads = 17 when beast123 reports:

WARNING: metadata$season is either missing or not given as a valid specifier string (e.g., none, harmonic, or
     dummy). A default season='harmonic' is assumed. 

INFO: To supress printing the parameers in beast123(),   set extra.printOptions = 0  
INFO: To supress printing the parameers in beast(),      set print.options = 0 
INFO: To supress printing the parameers in beast.irreg(),set print.options = 0 
INFO: To supress warning messages in beast123(),         set extra.quiet = 1  
INFO: To supress warning messages in beast(),            set quiet = 1 
INFO: To supress warning messages in beast.irreg(),      set quiet = 1 

%--------------------------------------------------%
%       Brief summary of Input Data                %
%--------------------------------------------------%
Data Dimension: [12x9x1066] - 108 signals of length 1066 each
IsOrdered     : No, unordered in time, to be sorted/ordered before running BEAST
IsRegular     : No, unevenly spaced at avg interval of  0.0352235 year = 0.422681 months = 12.8566 days
Preprocessing : Aggregate irregular data into a regular interval of 0.0833333 year = 1 months = 30.4167 days
HasSeasonCmpnt: true  | period = 1 year = 12 months = 365 days. The model 'Y=Trend+Season+Error' is fitted.
              : Num_of_DataPoints_per_Period = period/deltaTime = 1/0.0833333 = 12
HasOutlierCmpt: false | If true, Y=Trend+Season+Outlier+Error fitted instead of Y=Trend+Season+Error
Deseasonalize : false | If true, remove a global seasonal  cmpnt before running BEAST & add it back after BEAST
Detrend       : false | If true, remove a global trend component before running BEAST & add it back after BEAST
MissingValue  : NaN  flagged as missing values 
MaxMissingRate: if more than 75% of data is missing, BEAST will skip it.

%--------------------------------------------------%
%      OPTIONS used in the MCMC inference          %
%--------------------------------------------------%

%......Start of displaying 'MetaData' ......
metadata                = []         % metadata is used to interpret the input data
metadata.season         = 'harmonic' % fit a harmonic model to the periodic component
metadata.startTime      = 1984.25    % 1984-04-01
metadata.deltaTime      = 0.0833333  % 0.0833333 year(s) = 1 month(s) = 30.4167 day(s)
metadata.period         = 1          % 1 year(s) = 12 month(s) = 365 day(s) 
metadata.whichDimIsTime = 3
metadata.maxMissingRate = 0.75       % if more than 75% of data is missing, BEAST will skip it.
metadata.deseasonalize  = false      % If true,remove a global seasonal cmpnt before running BEAST & add it back later
metadata.detrend        = false      % If true,remove a global trend  cmpnt before running BEAST & add it back later
%........End of displaying MetaData ........

%......Start of displaying 'prior' ......
prior                   = []         % prior is the key model parameters of BEAST
prior.seasonMinOrder    = 1          % sorder.minmax[1]: min harmonic order alllowed
prior.seasonMaxOrder    = 5          % sorder.minmax[2]: max harmonic order alllowed
prior.seasonMinKnotNum  = 0          % scp.minmax[1]   : min num of seasonal chngpts
prior.seasonMaxKnotNum  = 5          % scp.minmax[2]   : max num of seasonal chngpts
prior.seasonMinSepDist  = 6          % sseg.min        : min seasonal segment length in terms of datapoints
prior.trendMinOrder     = 0          % torder.minmax[1]: min trend polynomial order alllowed
prior.trendMaxOrder     = 1          % torder.minmax[2]: max trend polynomial order alllowed
prior.trendMinKnotNum   = 0          % tcp.minmax[1]   : min num of chngpts in trend
prior.trendMaxKnotNum   = 10         % tcp.minmax[2]   : min num of chngpts in trend
prior.trendMinSepDist   = 6          % tseg.min        : min trend segment length in terms of datapoints
prior.K_MAX             = 82         % max number of terms in general linear model (useful only at small values)
prior.precValue         = 1.5        % useful mainly when precPriorType='constant'
prior.modelPriorType    = 1         
prior.precPriorType     = 'uniform'
%......End of displaying prior ......

%......Start of displaying 'mcmc' ......
mcmc                           = []         % mcmc is not BEAST parameters but MCMC sampler options
mcmc.seed                      = 0          % A nonzero seed to replicate among runs
mcmc.samples                   = 3000       % Number of samples saved per chain: the larger, the better
mcmc.thinningFactor            = 1          % Thinning the chain: the larger, the better 
mcmc.burnin                    = 150        % Number of inital samples discarded: the larger, the better
mcmc.chainNumber               = 3          % Nunber of chains: the larger, the better
mcmc.maxMoveStepSize           = 12         % Max step of jumping from current changepoint: No need to change
mcmc.trendResamplingOrderProb  = 0.1        % Proposal probability of sampling trend polynominal order 
mcmc.seasonResamplingOrderProb = 0.17       % Proposal probability of sampling seasoanl order 
mcmc.credIntervalAlphaLevel    = 0.95       % The alphal level for Credible Intervals
% Total number of models randomly visited is (burnin+sampples*thinFactor)*chainNumber=9450
%......End of displaying mcmc ......

%......Start of displaying 'extra' ......
extra                      = []         % extra is used to configure output/computing options
extra.dumpInputData        = true
extra.whichOutputDimIsTime = 3
extra.computeCredible      = false
extra.fastCIComputation    = true
extra.computeSeasonOrder   = false
extra.computeTrendOrder    = false
extra.computeSeasonChngpt  = true
extra.computeTrendChngpt   = true
extra.computeSeasonAmp     = false
extra.computeTrendSlope    = false
extra.tallyPosNegSeasonJump= false
extra.tallyPosNegTrendJump = false
extra.tallyIncDecTrendJump = false
extra.printProgressBar     = true
extra.printOptions         = true
extra.consoleWidth         = 110
extra.numThreadsPerCPU     = 1
extra.numParThreads        = 17
%......End of displaying extra ......

Parallel computing: thread#1  generated ... 
Parallel computing: thread#2  generated ... 
Parallel computing: thread#3  generated ... 
Parallel computing: thread#4  generated ... 
Parallel computing: thread#5  generated ... 
Parallel computing: thread#6  generated ... 
Parallel computing: thread#7  generated ... 
Parallel computing: thread#8  generated ... 
Parallel computing: thread#9  generated ... 
Parallel computing: thread#10 generated ... 
Parallel computing: thread#11 generated ... 
Parallel computing: thread#12 generated ... 
Parallel computing: thread#13 generated ... 
Parallel computing: thread#14 generated ... 
Parallel computing: thread#15 generated ... 
Parallel computing: thread#16 generated ... 
Parallel computing: thread#17 generated ... 
Rbeast: Waiting on 17 threads...
Press and hold CTR+C to interrupt and quit while running.
 100.0%done<Remaining00hrs00min00sec>[=======================================================================]

Finalizing ... 
Rbeast: Thread #0  finished ... 
Rbeast: Thread #1  finished ... 
Rbeast: Thread #2  finished ... 
Rbeast: Thread #3  finished ... 
Rbeast: Thread #4  finished ... 
Rbeast: Thread #5  finished ... 
Rbeast: Thread #6  finished ... 
Rbeast: Thread #7  finished ... 
Rbeast: Thread #8  finished ... 
Rbeast: Thread #9  finished ... 
Rbeast: Thread #10 finished ... 
Rbeast: Thread #11 finished ... 
Rbeast: Thread #12 finished ... 
Rbeast: Thread #13 finished ... 
Rbeast: Thread #14 finished ... 
Rbeast: Thread #15 finished ... 

There is one interesting line: Rbeast: Waiting on 17 threads...

And then MATAB crash ...

To me, situation looks like, that Rbeast mex file has the problem (of course, on my system only) with situation when asking for more than 16 (logical not physical CPUs) available Threads. My computer has only 8 physical cores + multiThreading ON (2 threads per each physical core!!!) Could be this some clue for you?

Michal, I has changed my code, and believe now that it should be running OK on your Linux system: I pinpointed where the error could occur and tried to find a fix . If you are still interested, give it a try and let me know the result. THanks.

qlahcim commented 1 year ago

I am definitely highly interested to test and use your code Beast, but this week I am extremelly bussy, so please give me a weak...

Now I can confirm the same problem on 3 other machines with latest MATLAB version R2023 and Linux Mint 21 or Ubuntu 22.04 OS.

qlahcim commented 1 year ago

Just a quick response. Latest version of beast123 works on my machines well now!!! Thanks a lot for your help and effort.

I think that would be great to try to prepare a simple as possible test case to demonstrate this very specific problem (bug) to TMW support. If I understand well, the problem is very probably somewhere at MATLAB MEX API (version 20213a), Am I right? I understand that TMW is not so active regarding MEX API problems, but bug report regarding this problem should be generally very helpful.

zhaokg commented 1 year ago

Michal, Thanks a lot. It is still hard for me to pinpoint the exact problem (i.e., which functions caused the error). One thing for sure is that it is due to the inconsistencies in the pthread library across different OS systems .pthread is the library I used for multi-threading. My understanding is that the pthread library is Linux's own rather than TMW's. I got around the inconsistent behaviors using a safer path when calling them.

Seems like that the original problem got resolved. I will close this ticket soon until you have any other things to share. Meanwhile if you have any other new questions, you can ask here or directly write to me at zhao.1423@osu.edu. Thanks again for the IMPORTANT feedback.