pymc-devs / pymc-bart

https://www.pymc.io/projects/bart
Other
85 stars 16 forks source link

OOS and pct_change coal mine disaster #56

Open waudinio27 opened 1 year ago

waudinio27 commented 1 year ago

Hello Osvaldo!

I am trying with the BART and in my opinion this shines like a crown. Something like a real PYMC jewel :-D

You say at the end of the notebook that one needs to detrend. All this is hard for me, I am a not so good programmer.

Could you show the coal mine disaster or some other example and how to do some out of sample predictions - but not with train test but true OOS - maybe 10 steps ahead of the coal mine disaster data or another example - with no extra features? I do not need train test - I see that this is working.

Also, could you show how to do make the series stationary and after reverse the process with an inverse transform and plot the final result? To make everything complete. Or would you take out the trend with a polynominal fit? In Light GBM they do this as well and therefore it is very popular. It was dominant at M5 competition.

https://towardsdatascience.com/xgboost-for-timeseries-lightgbm-is-a-bigger-boat-197864013e88

Would be extremely helpful!

juanitorduz commented 1 year ago

hey! An easy way to detrend the series is to take first differences as described in https://otexts.com/fpp3/stationarity.html To transform back the series you can take a cumulative sum. Regarding forecasting using tree-based models, I guess you could prepare the data set via a reduction approach which is to create a design matrix from a time series. Maybe you could use some tools from sktime (see here and simply use the BART model as described in the notebook where X is now the time series reduced (i.e. wrapped). It is interesting that the article you shared leverages upon linear models in the nodes, which is related to https://github.com/pymc-devs/pymc-bart/issues/51

I hope this helps :)

waudinio27 commented 1 year ago

Hello Juan! Thank you for your reply. I can make the series stationary and make an inverse transform starting from the last known real value with cumulative sum. Just like this, I will lose a lot of structural information. The team of PYMC is fantastic with people from Europe, South America and Asia, but the forecasting with the program remains a big issue. There should be more invested into UI and UX design and easy to set up examples. Otherwise, the people that do not want to go into it too deep will stay with AutoARIMA, Facebook Prophet or LightGBM. I stayed away from PYMC for a while because of this and got back because of curiosity and got exited when I saw BART and Structural AR. BART would be competitive with the trend alone and even more with seasonality added as well, but it will probably take time until this happens. If somebody wants to predict future river continuum or warehouse stock, the posterior is simply not enough. I will think about your idea with the design matrix and the reduction approach as a workaround. I will need time to judge if this could be a way forward, because I do not know about overfitting in this case.

Best regards and greetings Matthias

waudinio27 commented 1 year ago

Here you have an easy package for detrend and reversion - just saw it today and thought it will fit the discourse.

https://medium.com/towards-data-science/time-series-transformations-and-reverting-made-easy-f4f768c18f63

waudinio27 commented 1 year ago

Dear Juan, you have been right from the start. The data has to be prepared with the design matrix from sktime or by a handmade function to put the lags for the trend.

I have adopted the model for coal mine to put the mutable data -

mutable data kommt beim x_data - da muss man nur noch unten x_test einfügen dann geht es los ....

with pm.Model() as modelcoal: μ = pmb.BART("μ_", pm.MutableData("X", x_data), Y=ydata, m=20) μ = pm.Deterministic("μ", pm.math.abs(μ)) y_pred = pm.Poisson("y_pred", mu=μ, observed=y_data) idata_coal = pm.sample(random_seed=RANDOM_SEED)

But I am not able to make the out of sample predictions after the training with the whole data.

I want to do the same for the quantile regression as well which is a great notebook with great ideas.

Best regards

Matthias