Open matsuobasho opened 5 months ago
@matsuobasho Did you get any way of doing this?
@matsuobasho @OrionStar25 the TimeSeriesForecastingPipeline
is intended for inference. For fine tuning you can follow the steps in Cell 4 of the above notebook, with the exception that the Trainer.evaluate
call is only needed for evaluation.
Once you have a fine-tuned model, you can use it with the TimeSeriesForecastingPipeline, just like the pre-trained model can be used for zero-shot inference.
@wgifford thanks for the explanation. I would like to get the actual predictions on the test set - how would I do that? The fewshot_output
object within the fewshot_finetune_eval
function has just evaluation metrics.
@matsuobasho This is how I did it:
forecast_pipeline = TimeSeriesForecastingPipeline(
model=finetune_forecast_trainer.model,
timestamp_column=timestamp_column,
id_columns=id_columns,
target_columns=target_columns,
freq="1h",
feature_extractor=tsp,
explode_forecasts=False,
inverse_scale_outputs=True,
)
forecasts = forecast_pipeline(tsp.preprocess(test_data_df))
forecasts.head()
Thanks @wgifford . When I try to run the following:
from tsfm_public.toolkit.time_series_forecasting_pipeline import TimeSeriesForecastingPipeline
forecast_pipeline = TimeSeriesForecastingPipeline(
model=finetune_forecast_trainer.model,
timestamp_column=timestamp_column,
id_columns=id_columns,
target_columns=target_columns,
freq="15min",
feature_extractor=tsp,
explode_forecasts=False,
)
forecasts = forecast_pipeline(tsp.preprocess(test_dataset))
as per your example, I get an error:
AttributeError: 'ForecastDFDataset' object has no attribute 'copy'
It occurs on the inner tsp_preprocess
function. test_dataset
is a ForecastDFDataset
type object. I haven't pulled the last 2 commits (I had run the funetuning already), so maybe that's the issue and I have to retrain?
@matsuobasho See this: https://github.com/ibm-granite/granite-tsfm/issues/46
Thanks, @OrionStar25 should have queried for that error, especially since I was the one who had encountered it before.
@wgifford ok I got the gist of this process. Since finetuning works with a Dataset type object, but the pipeline works with pandas dataframes only, is there a straightforward way to convert the output of tsp.get_datasets
to a dataframe? I know I can iterate through the ForecastDFDataset object and reconstitute it back to a dataframe, but would be better to either convert it in a more efficient way or use the same indices to create a test dataframe from the original input.
I checked in the get_dataset
function and functions it calls but don't see that it sets a seed anywhere.
train_dataset, valid_dataset, test_dataset = tsp.get_datasets(
df, split_config, fewshot_fraction=fewshot_fraction, fewshot_location="first")
@matsuobasho @OrionStar25
I am actually not completely satisfied with get_dataset
for a couple of reasons:
get_dataset
but actually require dataframe output.I am considering creating two standalone functions: one that handles the dataframe creation process (takes preprocessor and split configuration as input), and another that uses that function and then creates the torch datasets.
TimeSeriesForecastingPipeline
was meant to allow for simple use -- i.e., from some chunk of time series data on which you wish to forecast, the user should not have to go through the process of creating torch datasets.
What do you think?
@wgifford thanks for the reply and insights. Yes, your plan sounds very reasonable. That way, the result of the first function can be used as an input to TimeSeriesForecastingPipeline
and then the result of the second function can be used for training. Feel free to close this issue as your guidance has solved the question I had, unless you'd like to add something else.
@matsuobasho Can you try the prepare_data_splits
function here: https://github.com/ibm-granite/granite-tsfm/blob/879c707b082a7b2a9dbf994aec4e53f9e2dec808/tsfm_public/toolkit/time_series_preprocessor.py#L754
to see if it meets your needs?
Thanks!
@wgifford when I try
from tsfm_public.toolkit.time_series_preprocessor import TimeSeriesPreprocessor, prepare_data_splits
I get an ImportError
on the prepare_data_splits
import.
Also, once I get that to work, since I still run get_dataset
for the finetuning purposes, the actual split for the get_dataset
and prepare_data_splits
won't be the same, correct (i.e noseed)? If so, that won't really work unless we incorporate a seed to have the same output across the 2 functions.
@matsuobasho I think the response here: https://github.com/ibm-granite/granite-tsfm/issues/46#issuecomment-2264249530 also addresses this issue?
I see the way to do few-shot finetune in this tutorial.
However, how would I do it with the
TimeSeriesForecastingPipeline
?