Open SH20025133 opened 1 year ago
Hi there, You definitely caught me. And the segmentation fault is expected: When I wrote the code, it was implemented with an intention for not too long time series. Typical sample sizes handled in the applications I am familiar with are less than a few thousands. The rationale is that the use of a Bayesian algorithm for a super long time series should be very time-consuming. (I guess it may take the beast algorithm hours to segment and decompose a time series of 100 k in length). With that said, in my code, I used some small-type integer variables (e.g., 8-bit integers or 16-bit integers such as int8_t, or int16_t) for some parts. I expect these small-type integers will be overfloated and give random segmentation errors when time series are long.
If you see some potential values of the algorithm and also are able to share some sample data, I am happy to test-run it for you and if needed, change and recompile the code to accommodate large sample sizes. Hopefully , this clarifies.
Thank you for the update. To my surprise BEAST is working quite fast even with large sample sizes like 1000k (infact the time taken by it is comparable to non-Bayesian approaches like PELT). I have system trace that marks timestamps for requests which I'd like to do changepoint analysis(it has roughly a size of 3600k).
I can send you the file over email if required for updating the code.
Yes, can you please send me the file over email at zhao.1423@osu.edu? Thanks a lot
I have reached out to you over email with the sample dataset.
The Python version of the package (also the R) works fine with smaller sample sizes till 10k and even for some 100k values, but as I keep on increasing the sample size, the above mentioned error becomes more and more prominent to a point, the code doesn't execute at all.