Is there a way to deal with categorical feature?

zhouhaoyi / Informer2020

The GitHub repository for the paper "Informer" accepted by AAAI 2021.

Apache License 2.0

5.45k stars 1.13k forks source link

Is there a way to deal with categorical feature? #191

Closed kja815 closed 3 years ago

kja815 commented 3 years ago

I want to train model with scalar or categorical feature.

but I can't find the way to deal with categorical feature in informer.

is available to control categorical feature in informer?

zhouhaoyi commented 3 years ago

The categorical feature is important in the time-series problem, we may add it to our to-do list. If you have best practices, pull requests are highly welcome.

kja815 commented 3 years ago

@zhouhaoyi thank you for your answer. I think ETTh dataset has only one time series set. Informer is available to train for (different) multiple time series set (with same features)? for example, there are time series data (like electric consumption) for house1, ... , house100 with same features. single Informer model can be trained with these data?

cookieminions commented 3 years ago

@zhouhaoyi thank you for your answer. I think ETTh dataset has only one time series set. Informer is available to train for (different) multiple time series set (with same features)? for example, there are time series data (like electric consumption) for house1, ... , house100 with same features. single Informer model can be trained with these data?

Hi, if these time series have more than one feature, Informer cannot deal with these data now.

777udo commented 3 years ago

@cookieminions so the Informer is able to deal with data from multiple devices in one training set for the univariate case? Why is it that it doesn't work for multivariate data and are there requirements on how the input data containing multiple devices has to be ordered or preprocessed (multiple identical timestamps for several devices)?

cookieminions commented 3 years ago

The input's shape of Informer model without input layer must be [batch_size, seq_len, dimension], so if your data is multi time series with multi variate, the input's shape of input layer may be [batch_size, seq_len, num_series, num_features]. If you want to use Informer to deal with multi time series whose features is more than 1, you need to modify input layer. A feasible solution is using emebdding layer for each categorical feature and aggregating the embeddings together, and then feed the embeddings to Informer.

777udo commented 3 years ago

@cookieminions thanks for your reply. Consider I use univariate data sets, only having timestamps and unse only one feature, but for multiple households for example, like @kja815 describes it. Then there would be multiple identical timestamps referring to different households. But each input sample for the encoder has to receive sequential input of one distinct household. Can the model handle this by just appending data of different households into one big csv file as input? Like jan-dec household one append jan-dec household two etc.

cookieminions commented 3 years ago

As your description, can your data be organized as a big csv with all households (each household has only 1 feature) and timestamp, whose columns are date, household1, household2, ..., householdN? If my understanding is correct, you can feed the data into Informer directly, and the model will deal with the multi-series as multi-variates.

If each household has more than 1 feature, and the data will be date, household1_feat1, household1_feat2, ..., householdN_feat1, householdN_feat2, ... householdN_featM, you need to aggregate the features of each household together, and feed data such as household1_embed, household2_embed, ... householdN_embed into Informer, where householdN_embed is aggregated by householdN_feat1 to householdN_featM.

Please correct me if I am wrong.

Lisa-FFY commented 3 years ago

Excuse me,My data is a dichotomy problem,what should I do with my tags?

zhouhaoyi commented 3 years ago

Excuse me,My data is a dichotomy problem,what should I do with my tags?

Could you please provide more descriptions of your dataset?