zhaokg / Rbeast

Bayesian Change-Point Detection and Time Series Decomposition
208 stars 36 forks source link

ERROR: r1 < r2; this should never happen and there must be something wrong! #26

Closed Karrol closed 3 months ago

Karrol commented 3 months ago

I try to find the best fitted model , so i use grid search based on diffrent scp_max and tcp_max combination. when I set parameters like :o=rb.beast(ndvi_values[:],tcp_minmax=(0,tcp_max),scp_minmax=(0,scp_max),quiet=True, mcmc_seed=42) ,it works well. But when i set parameters like :o=rb.beast(ndvi_values[:],tcp_minmax=(tcp_max,tcp_max),scp_minmax=(scp_max,scp_max),quiet=True, mcmc_seed=42), one of my time series will throw error r1<r2 during the circulate.

作者您好,我想请问一下这个问题为什么出现。我希望通过参数搜索找到rmse最低的拟合函数,然后我尝试了两种设置变点数量的方法。版本1是设置范围,版本2则直接制定变点数。也许是我搜索的范围过大而时间序列不够长吗?我在版本2的循环中获得了报错。

Karrol commented 3 months ago

我的代码如下 import Rbeast as rb # Import the Rbeast package as rb from datetime import datetime def cal_o(signal,tcp_max,scp_max): ndvi_values = pd.Series(signal.values) ndvi_index = pd.Series(signal.index)

调整图形大小

#plt.figure(figsize=(12, 8)) 
metadata                =  rb.args() ### or 'lambda: None': just get an empty object### # metadata is used to interpret the input data Y
metadata.time           = 'hour'
metadata.season         = 'harmonic' # fit a harmonic model to the periodic component
metadata.startTime      = 1         # unknown unit
metadata.deltaTime      = 1          # unknown unit
metadata.period         = 24           # unknown unit [Guessed by autocorrelation]
metadata.maxMissingRate = 0.75       # if more than 75% of data is missing, BEAST will skip it.
metadata.deseasonalize  = False      # if true,remove a global seasonal cmpnt before running BEAST & add it back later
metadata.detrend        = False      # if true,remove a global trend  cmpnt before running BEAST & add it back later

版本1:变点数量搜寻范围

#o=rb.beast(ndvi_values[:],tcp_minmax=(0,tcp_max),scp_minmax=(0,scp_max),quiet=True, mcmc_seed=42) 
#版本2:直接假设变点数量是多少
o=rb.beast(ndvi_values[:],tcp_minmax=(tcp_max,tcp_max),scp_minmax=(scp_max,scp_max),quiet=True, mcmc_seed=40) #随机数种子固定,这样可以对比不同的参数效果
return o 

def find_best_tcp_max(signal,search_tcp_max_range,search_scp_max_range): o_rmse_list=[] o_R2_list=[] for i in range(1,search_tcp_max_range): for j in range (1,search_scp_max_range): o=cal_o(signal,i,j) o_rmse=o["sig2"].RMSE o_R2=o["sig2"].R2 print(o["sig2"]. R2) print(o["sig2"]. RMSE) o_rmse_list.append([i,j,o_rmse]) o_R2_list.append([i,j,o_R2]) return o_rmse_list,o_R2_list

zhaokg commented 3 months ago

Dear Karrol,

Thanks a lot for giving BEAST a try and reporting the issue. The same problem was previously reported by another user; for that case, it is because that user was using a Linux cluster with Intel AVX-512 CPUs and I posted a new version Rbeast v0.1.19 to fix the error for his particular computing environment.

Are you using the latest Python version? If not, you can install it via

pip uninstall Rbeast -y
pip install Rbeast=0.1.19

If you indeed use the latest version, would you mind sharing more info with me such as the machine type or Python version you are using? Ideally if you can share a sample time series for you to reproduce the error on my end, I will try to pinpoint and fix it.

Not sure about the nature of your problem. Typically, it is no brainer that a more complex model (i.e., with more changepoints) will give lower RMSE because there are more model parameters, just like that a linear model with more covariates will give lower RMSE regardless. If your purpose is not just to merely fit the time series but also to get a more meaningful model, a more sensible way to pinpoint the best hyperparameters (e.g., tcp_max) is to examine the o.marg_lik: the higher the marginal likelihood, the better the model. (Note that the marginal likelihood reported by BEAST is a sampled-based estimate, so it also varies from one run to another even for the same setting.)

Best, Kaiguang

Karrol commented 3 months ago

Dear Kaiguang, Thank you for your timely answer. I am trying to find meaningful changepoint but I don’t have enough priori knowledge to prejudge how many chagepoint there are. I also worry about overfitted but I'm a novice in this field having no idea what evaluation is better than rmse. Your suggestion is significant!   I do use the latest BEAST python version. I will report more information about my devices and python evns:

Karrol

发件人: zhaokg 发送时间: 2024年3月28日 3:45 收件人: zhaokg/Rbeast 抄送: Karrol; Author 主题: Re: [zhaokg/Rbeast] ERROR: r1 < r2; this should never happen andthere must be something wrong! (Issue #26)

Dear Karrol, Thanks a lot for giving BEAST a try and reporting the issue. The same problem was previously reported by another user; for that case, it is because that user was using a Linux cluster with Intel AVX-512 CPUs and I posted a new version Rbeast v0.1.19 to fix the error for his particular computing environment. Are you using the latest Python version? If not, you can install it via pip uninstall Rbeast -y pip install Rbeast=0.1.19 If you indeed use the latest version, would you mind sharing more info with me such as the machine type or Python version you are using? Ideally if you can share a sample time series for you to reproduce the error on my end, I will try to pinpoint and fix it. Not sure about the nature of your problem. Typically, it is no brainer that a more complex model (i.e., with more changepoints) will give lower RMSE because there are more model parameters, just like that a linear model with more covariates will give lower RMSE regardless. If your purpose is not just to merely fit the time series but also to get a more meaningful model, a more sensible way to pinpoint the best hyperparameters (e.g., tcp_max) is to examine the o.marg_lik: the higher the marginal likelihood, the better the model. (Note that the marginal likelihood reported by BEAST is a sampled-based estimate, so it also varies from one run to another even for the same setting.) Best, Kaiguang — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

zhaokg commented 3 months ago

Thanks Karrol, I didn't see any data attached, Can you please directly send it to my work email zhao.1423@osu.edu?

Your CPU does support the AVX-512 instruction set. By default, I chose some AVx-512 functions in my beast program. Can you please add the cputype arg to avoid the use of AVX-512? As shown below, set cputype=0 or cputype=2 to see whether you still get the error message:

o = rb.beast(Nile, start=0, tcp_minmax=[0, 2], torder_minmax = [1, 1], season='none',   mcmc_seed=1, cputype=2)

Thanks a lot,

Kaiguang

zhaokg commented 3 months ago

Karrol,

I see what happened. You are setting too large a value for tcp_minmax. Here is an demo with the Nile time series. It will pass if the number of changepoints is fixed at 23 but fail if setting to 24,

import Rbeast as rb
(y,t)=rb.load_example('nile')
rb.beast(y, season='none', tcp_minmax=[23, 23]) #OK
rb.beast(y, season='none', tcp_minmax=[24, 24]) #Failed

The Nile vector has only 100 data points, and also the minimum separation distance between neighboring changepointt is set to 3, as shown below.

prior$trendMinSepDist   = 3          # tseg.min        : min trend segment length in terms of datapoints

That said, the shortest segment will be 3+1=4 in length, excluding the starting and ending segments. The time series can have at most (100-4-4)/4=23 changepoints. So, setting it to 24 will crash.

In the next release, I will make sure such edge cases will be forced into the safe zone. But the key is for you to use a reasonable value for the maximum number of changepoint allowed.

Hopefully, this makes sense. Kaiguang

Karrol commented 3 months ago

Kaiguang, Yes. I think you are right. This error comes up regularly. It follows from the equation you mentioned. It's really helpful.