zhaokg / Rbeast

Bayesian Change-Point Detection and Time Series Decomposition
208 stars 36 forks source link

Windows fatal exception: access violation - for a large number of points #20

Closed hoto91 closed 6 months ago

hoto91 commented 7 months ago

Hello and thank you for this great package. I am using Rbeast 0.1.16 in python 3.9.12 and i 've noticed that when i use the code below i get : Windows fatal exception: access violation in windows 10, and Progress: 0.0% done[>*************************************************************][1] 1706831 segmentation fault (core dumped) in linux. I 've also noticed that this is happening when the length of the series is approximately over 5300 points. With lesser points it works properly. Here is the code :

import Rbeast as rb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#---------------------------------Data Generation-------------------------------#
def generate_dataset_with_noise(num_breakpoints, noise_level):
    data = []
    for i in range(num_breakpoints):
        if i == 0:
            segment_length = np.random.randint(5, 1000)
        else:
            segment_length = np.random.randint(3, 800)
        value = np.random.randint(1, 10)

        segment = np.full(segment_length, value)
        # Generate noise with the same length as the segment
        noise = np.random.normal(0, noise_level, segment_length)

        # Add the noise to the segment values
        segment_with_noise = segment + noise
        data.extend(segment_with_noise)
    return data

# Specify the number of breakpoints and noise level
num_breakpoints =17
noise_level =0.3

# Generate a dataset with noise added to the integer values
data = generate_dataset_with_noise(num_breakpoints, noise_level)
df=pd.Series(data)

#--------Rbeast--------------------#

o = rb.beast(df, \
          start          = 1,
          deltat         = 1,
          season         = 'none',  # 'harmonic','dummy','svd','none'
          period         = float('nan'),
          scp_minmax     = [0, 10],
          sorder_minmax  = [0, 5],
          sseg_minlength = None,  # an integer
          tcp_minmax     = [0, 10],                                        
          torder_minmax  = [0, 1],
          tseg_minlength = None,  # an integer
          detrend        = False,
          deseasonalize  = False,
          mcmc_seed      = 0,
          mcmc_burbin    = 200,
          mcmc_chains    = 3,
          mcmc_thin      = 5,
          mcmc_samples   = 5000,                                                 
          precValue      = 1.5,
          precPriorType  = 'componentwise',  # componentwise','uniform','constant','orderwise'
          print_options  = True,
          print_progress = True,
          quiet          = False,
          hasOutlier     = False,
          ocp_max        = 10,
          gui            = False) 
dirt commented 7 months ago

Dear HOTO91,

First of all, thanks a lot for giving BEAST a try and reporting the issue. Sorry about the error. You sample code is a great help and I was able to pinpoint the bug: I used a 32-bit integer (int32) in a function, which overflows if the time series is too long. I fixed it by changing it to a 64-bit integer.

It is a quick fix for the overflow error but there are a few extra features I am adding to accommodate others’ requests. Give me one or two days to release a new version.

Thanks again for your valuable feedbacks.

Kaiguang

From: hoto91 @.> Sent: Wednesday, November 29, 2023 7:51 AM To: zhaokg/Rbeast @.> Cc: Subscribed @.***> Subject: [zhaokg/Rbeast] Windows fatal exception: access violation - for a large number of points (Issue #20)

Hello and thank you for this great package. I am using Rbeast 0. 1. 16 in python 3. 9. 12 and i 've noticed that when i use the code below i get : Windows fatal exception: access violation in windows 10, and Progress: 0. 0% done[>*****][1]

Hello and thank you for this great package. I am using Rbeast 0.1.16 in python 3.9.12 and i 've noticed that when i use the code below i get : Windows fatal exception: access violation in windows 10, and Progress: 0.0% done[>*****][1] 1706831 segmentation fault (core dumped) in linux. I 've also noticed that this is happening when the length of the series is approximately over 5300 points. With lesser points it works properly. Here is the code :

import Rbeast as rb

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

---------------------------------Data Generation-------------------------------

def generate_dataset_with_noise(num_breakpoints, noise_level):

data = []

for i in range(num_breakpoints):

    if i == 0:

        segment_length = np.random.randint(5, 1000)

    else:

        segment_length = np.random.randint(3, 800)

    value = np.random.randint(1, 10)

    segment = np.full(segment_length, value)

    # Generate noise with the same length as the segment

    noise = np.random.normal(0, noise_level, segment_length)

    # Add the noise to the segment values

    segment_with_noise = segment + noise

    data.extend(segment_with_noise)

return data

Specify the number of breakpoints and noise level

num_breakpoints =13

noise_level =0.3

Generate a dataset with noise added to the integer values

data = generate_dataset_with_noise(num_breakpoints, noise_level)

df=pd.Series(data)

--------Rbeast--------------------

o = rb.beast(df, \

      start          = 1,

      deltat         = 1,

      season         = 'none',  # 'harmonic','dummy','svd','none'

      period         = float('nan'),

      scp_minmax     = [0, 10],

      sorder_minmax  = [0, 5],

      sseg_minlength = None,  # an integer

      tcp_minmax     = [0, 10],

      torder_minmax  = [0, 1],

      tseg_minlength = None,  # an integer

      detrend        = False,

      deseasonalize  = False,

      mcmc_seed      = 0,

      mcmc_burbin    = 200,

      mcmc_chains    = 3,

      mcmc_thin      = 5,

      mcmc_samples   = 5000,

      precValue      = 1.5,

      precPriorType  = 'componentwise',  # componentwise','uniform','constant','orderwise'

      print_options  = True,

      print_progress = True,

      quiet          = False,

      hasOutlier     = False,

      ocp_max        = 10,

      gui            = False)

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/zhaokg/Rbeast/issues/20__;!!KGKeukY!zXRUDWo6NGzp44qMi3W_79az56iewB9_IwcRUeBnkZmjydQH7xXugJ1KrxIxhSV4kHE29FdehtycPts5fIOSatMPB7Qu$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AMCAGWRTWPGSGMJ6CJ2MYV3YG4VSVAVCNFSM6AAAAAA77NG3G6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTMNJSG42TONI__;!!KGKeukY!zXRUDWo6NGzp44qMi3W_79az56iewB9_IwcRUeBnkZmjydQH7xXugJ1KrxIxhSV4kHE29FdehtycPts5fIOSauEL3blO$. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

hoto91 commented 7 months ago

Dear dirt,

Thank you for your quick response. Nice to hear that it was an easy fix.I will be waiting for the new version to come out.

zhaokg commented 6 months ago

Dear Hoto91,

I just posted a new version. Sorry for the long waiting, if any. Now you install it via "pip install Rbeast==0.1.17".

Below is a quick test using your sample data. Here I used 50 changepoints, resulting in a time series of about 20000 in length. Given that there are 50 changepoints, make sure to specify the maximum trend changepoints allowed with a value larger than 50. Here I used 60, and you can use any value larger than 50; the result should be the same.

data = generate_dataset_with_noise(num_breakpoints = 50, noise_level =0.3)
df    =pd.Series(data)
o    = rb.beast(df, season='none', tcp_minmax=[0, 60])

Dear dirt,

Thank you for your quick response. Nice to hear that it was an easy fix.I will be waiting for the new version to come out.

hoto91 commented 6 months ago

Thank you for your effort. Seems to work fine.

dirt commented 6 months ago

Awesome! Thanks again for this valuable input, which really means a lot. Now since this specific problem seems to be resolved, I will close this ticket.

Kaiguang

zhaokg commented 6 months ago

Awesome! Thanks again for this valuable input, which really means a lot. Now since this specific problem seems to be resolved, I will close this ticket.

Kaiguang