mims-harvard / UniTS

A unified multi-task time series model.
https://zitniklab.hms.harvard.edu/projects/UniTS/
MIT License
363 stars 45 forks source link

download_data_all.sh can not download all datasets used in data_provider/multi_task_pretrain.yaml #1

Closed PingNie1 closed 3 months ago

PingNie1 commented 4 months ago

Hi there,

Thank you very much for this brilliant work.

  1. After I run download_data_all.sh, I find there is not enough datasets for the data_provider/multi_task_pretrain.yaml config. Could you please provide all the datasets wee need to run data_provider/multi_task_pretrain.yaml? Thank you very much.
  2. I can not import this code: image

Thank you very much.

gasvn commented 4 months ago

Hi, thank you for the feedback! 1) Can you give the name of the dataset that is missing? That would help us to fix this easier. 2) Change to from exp.exp_pretrain import Exp_All_Task as Exp_All_Task_SSL. We will fix these issues you mentioned. Thanks

PingNie1 commented 4 months ago

Thank you for your prompt response.

Let me list them which the datasets are not downloaded I think:

  NN5_p112:
    task_name: pretrain_long_term_forecast
    dataset_name: nn5_daily_without_missing
    dataset: NN5
    data: gluonts
    root_path: ../dataset/gluonts
    seq_len: 224
    label_len: 0
    pred_len: 0
    features: M
    embed: timeF
    enc_in: 111
    dec_in: 111
    c_out: 111

  LTF_ECL_p96:
    task_name: pretrain_long_term_forecast
    dataset: ECL
    data: custom
    embed: timeF
    root_path: ../dataset/electricity/
    data_path: electricity.csv
    features: M
    seq_len: 192
    label_len: 48
    pred_len: 0
    enc_in: 321
    dec_in: 321
    c_out: 321

  LTF_ECL_p192:
    task_name: pretrain_long_term_forecast
    dataset: ECL
    data: custom
    embed: timeF
    root_path: ../dataset/electricity/
    data_path: electricity.csv
    features: M
    seq_len: 288
    label_len: 48
    pred_len: 0
    enc_in: 321
    dec_in: 321
    c_out: 321

  LTF_ECL_p336:
    task_name: pretrain_long_term_forecast
    dataset: ECL
    data: custom
    embed: timeF
    root_path: ../dataset/electricity/
    data_path: electricity.csv
    features: M
    seq_len: 432
    label_len: 48
    pred_len: 0
    enc_in: 321
    dec_in: 321
    c_out: 321

  LTF_ECL_p720:
    task_name: pretrain_long_term_forecast
    dataset: ECL
    data: custom
    embed: timeF
    root_path: ../dataset/electricity/
    data_path: electricity.csv
    features: M
    seq_len: 816
    label_len: 48
    pred_len: 0
    enc_in: 321
    dec_in: 321
    c_out: 321

  LTF_ETTh1_p96:
    task_name: pretrain_long_term_forecast
    dataset: ETTh1
    data: ETTh1
    embed: timeF
    root_path: ../dataset/ETT-small/
    data_path: ETTh1.csv
    features: M
    seq_len: 192
    label_len: 48
    pred_len: 0
    enc_in: 7
    dec_in: 7
    c_out: 7

  LTF_ETTh1_p192:
    task_name: pretrain_long_term_forecast
    dataset: ETTh1
    data: ETTh1
    embed: timeF
    root_path: ../dataset/ETT-small/
    data_path: ETTh1.csv
    features: M
    seq_len: 288
    label_len: 48
    pred_len: 0
    enc_in: 7
    dec_in: 7
    c_out: 7

  LTF_ETTh1_p336:
    task_name: pretrain_long_term_forecast
    dataset: ETTh1
    data: ETTh1
    embed: timeF
    root_path: ../dataset/ETT-small/
    data_path: ETTh1.csv
    features: M
    seq_len: 432
    label_len: 48
    pred_len: 0
    enc_in: 7
    dec_in: 7
    c_out: 7

  LTF_ETTh1_p720:
    task_name: pretrain_long_term_forecast
    dataset: ETTh1
    data: ETTh1
    embed: timeF
    root_path: ../dataset/ETT-small/
    data_path: ETTh1.csv
    features: M
    seq_len: 816    
    label_len: 48
    pred_len: 0
    enc_in: 7
    dec_in: 7
    c_out: 7

  LTF_Exchange_p192:
    task_name: pretrain_long_term_forecast
    dataset: Exchange
    data: custom
    embed: timeF
    root_path: ../dataset/exchange_rate/
    data_path: exchange_rate.csv
    features: M
    seq_len: 288
    label_len: 48
    pred_len: 0
    enc_in: 8
    dec_in: 8
    c_out: 8

  LTF_Exchange_p336:
    task_name: pretrain_long_term_forecast
    dataset: Exchange
    data: custom
    embed: timeF
    root_path: ../dataset/exchange_rate/
    data_path: exchange_rate.csv
    features: M
    seq_len: 432
    label_len: 48
    pred_len: 0
    enc_in: 8
    dec_in: 8
    c_out: 8

  LTF_ILI_p60:
    task_name: pretrain_long_term_forecast
    dataset: ILI
    data: custom
    embed: timeF
    root_path: ../dataset/illness/
    data_path: national_illness.csv
    features: M
    seq_len: 96
    label_len: 18
    pred_len: 0
    enc_in: 7
    dec_in: 7
    c_out: 7

  LTF_Traffic_p96:
    task_name: pretrain_long_term_forecast
    dataset: Traffic
    data: custom
    embed: timeF
    root_path: ../dataset/traffic/
    data_path: traffic.csv
    features: M
    seq_len: 192
    label_len: 48
    pred_len: 0
    enc_in: 862
    dec_in: 862
    c_out: 862

  LTF_Traffic_p192:
    task_name: pretrain_long_term_forecast
    dataset: Traffic
    data: custom
    embed: timeF
    root_path: ../dataset/traffic/
    data_path: traffic.csv
    features: M
    seq_len: 288
    label_len: 48
    pred_len: 0
    enc_in: 862
    dec_in: 862
    c_out: 862

  LTF_Traffic_p336:
    task_name: pretrain_long_term_forecast
    dataset: Traffic
    data: custom
    embed: timeF
    root_path: ../dataset/traffic/
    data_path: traffic.csv
    features: M
    seq_len: 432
    label_len: 48
    pred_len: 0
    enc_in: 862
    dec_in: 862
    c_out: 862

  LTF_Traffic_p720:
    task_name: pretrain_long_term_forecast
    dataset: Traffic
    data: custom
    embed: timeF
    root_path: ../dataset/traffic/
    data_path: traffic.csv
    features: M
    seq_len: 816
    label_len: 48
    pred_len: 0
    enc_in: 862
    dec_in: 862
    c_out: 862

  LTF_Weather_p96:
    task_name: pretrain_long_term_forecast
    dataset: Weather
    data: custom
    embed: timeF
    root_path: ../dataset/weather/
    data_path: weather.csv
    features: M
    seq_len: 192
    label_len: 48
    pred_len: 0
    enc_in: 21
    dec_in: 21
    c_out: 21

  LTF_Weather_p192:
    task_name: pretrain_long_term_forecast
    dataset: Weather
    data: custom
    embed: timeF
    root_path: ../dataset/weather/
    data_path: weather.csv
    features: M
    seq_len: 288
    label_len: 48
    pred_len: 0
    enc_in: 21
    dec_in: 21
    c_out: 21

  LTF_Weather_p336:
    task_name: pretrain_long_term_forecast
    dataset: Weather
    data: custom
    embed: timeF
    root_path: ../dataset/weather/
    data_path: weather.csv
    features: M
    seq_len: 432
    label_len: 48
    pred_len: 0
    enc_in: 21
    dec_in: 21
    c_out: 21

  LTF_Weather_p720:
    task_name: pretrain_long_term_forecast
    dataset: Weather
    data: custom
    embed: timeF
    root_path: ../dataset/weather/
    data_path: weather.csv
    features: M
    seq_len: 816
    label_len: 48
    pred_len: 0
    enc_in: 21
    dec_in: 21
    c_out: 21

  CLS_Heartbeat:
    task_name: pretrain_classification
    dataset: Heartbeat
    data: UEA
    embed: timeF
    root_path: ../dataset/Heartbeat/
    seq_len: 405
    label_len: 0
    pred_len: 0
    enc_in: 61
    num_class: 2
    c_out: None

  CLS_JapaneseVowels:
    task_name: pretrain_classification
    dataset: JapaneseVowels
    data: UEA
    embed: timeF
    root_path: ../dataset/JapaneseVowels/
    seq_len: 29
    label_len: 0
    pred_len: 0
    enc_in: 12
    num_class: 9
    c_out: None

  CLS_PEMS-SF:
    task_name: pretrain_classification
    dataset: PEMS-SF
    data: UEA
    embed: timeF
    root_path: ../dataset/PEMS-SF/
    seq_len: 144
    label_len: 0
    pred_len: 0
    enc_in: 963
    num_class: 7
    c_out: None

  CLS_SelfRegulationSCP2:
    task_name: pretrain_classification
    dataset: SelfRegulationSCP2
    data: UEA
    embed: timeF
    root_path: ../dataset/SelfRegulationSCP2/
    seq_len: 1152
    label_len: 0
    pred_len: 0
    enc_in: 7
    num_class: 2
    c_out: None

  CLS_SpokenArabicDigits:
    task_name: pretrain_classification
    dataset: SpokenArabicDigits
    data: UEA
    embed: timeF
    root_path: ../dataset/SpokenArabicDigits/
    seq_len: 93
    label_len: 0
    pred_len: 0
    enc_in: 13
    num_class: 10
    c_out: None

  CLS_UWaveGestureLibrary:
    task_name: pretrain_classification
    dataset: UWaveGestureLibrary
    data: UEA
    embed: timeF
    root_path: ../dataset/UWaveGestureLibrary/
    seq_len: 315
    label_len: 0
    pred_len: 0
    enc_in: 3
    num_class: 8
    c_out: None
  CLS_FaceDetection:
    task_name: pretrain_classification
    dataset: FaceDetection
    data: UEA
    embed: timeF
    root_path: ../dataset/FaceDetection
    seq_len: 62
    label_len: 0
    pred_len: 0
    enc_in: 144
    num_class: 2
    c_out: None
PingNie1 commented 4 months ago

I think some of them are the same datasets with different settings.

PingNie1 commented 4 months ago

These are all unzipped dataset after I run the download script: image

teddykoker commented 4 months ago

Hi @erenup,

It appears all of the datasets you are missing are the ones extracted from TimesNet. Can you confirm that all_datasets.zip was downloaded successfully? This should have been downloaded and extracted with the following portion of the script:

# check for gdown https://github.com/wkentaro/gdown then install if necessary
if ! command -v gdown &> /dev/null
then
    echo "installing gdown, for downloading from google drive"
    pip install gdown
fi

# TimesNet data
# downloads all_datasets.zip and extracts into dataset/
if [ ! -f dataset/all_datasets.zip ]; then
    gdown "https://drive.google.com/file/d/1pmXvqWsfUeXWCMz5fqsP8WLKXR5jxY8z/view?usp=drive_link" --fuzzy -O dataset/all_datasets.zip
    unzip dataset/all_datasets.zip -d dataset/
    mv dataset/all_datasets/* dataset/
    rm -rf dataset/all_datasets
fi

However, gdown can sometimes fail to properly download from google drive. If this is the case you can download all_datasets.zip manually from here and extract into the dataset folder.

Aside from that, the NN5-* datasets should be downloaded automatically when you first run the script. Please let us know if you continue to have any issues!

PingNie1 commented 3 months ago

Hi @teddykoker Thank you for you reply. I manually downloaded the timesNet data. I do not find the NN5-* data.

  NN5_p112:
    task_name: pretrain_long_term_forecast
    dataset_name: nn5_daily_without_missing
    dataset: NN5
    data: gluonts
    root_path: ../dataset/gluonts
    seq_len: 224
    label_len: 0
    pred_len: 0
    features: M
    embed: timeF
    enc_in: 111
    dec_in: 111
    c_out: 111

Could you please provide the link to download it? Thank you very much.

teddykoker commented 3 months ago

The NN5 data will be downloaded through gluonts the first time you run the pre-training code.

PingNie1 commented 3 months ago

Hi @teddykoker Thank you for your reply. When I try to run the pre-train code, it failed to get the gluonts datasets. is there a way to downloaded it manually? Thank you.

teddykoker commented 3 months ago

What is the error message? You can download from here, however gluonts does some preprocessing and train/test split upon download that you would have to perform, so I would recommend using that. For your convenience here are the pre-processed and pre-split files which you can unzip into dataset/gluonts: nn5_daily_without_missing.zip

PingNie1 commented 3 months ago

Thank you very much. The error message:

Download nn5_daily_dataset_without_missing_values.zip:: 0.00B [00:00, ?B/s] 

It seems I can not connect to the link.

gasvn commented 3 months ago

I double checked the link, the link should be working, can you please check your network config?

PingNie1 commented 3 months ago

Thank you. it's the error of my network.