huawei-noah / vega

AutoML tools chain
http://www.noahlab.com.hk/opensource/vega/
Other
842 stars 175 forks source link

AvazuDataset for AutoFIS #106

Closed TiaoziLiao closed 2 years ago

TiaoziLiao commented 3 years ago

How do you process the data set? I run the examples/nas/fis/autogate_grda.yml, but it got wrong. image

I downloaded the data set on Kaggle and unzipped it, however the train and test dataset are .csv file. It wasn't an .npy file. image

So, is there any shell to preprocesses the dataset Avazu?

kinglai commented 3 years ago

We follow the data preprocess progress in Ads-RecSys-Datasets , and then transform data from hdf5 format into numpy format. thx.

TiaoziLiao commented 3 years ago

We follow the data preprocess progress in Ads-RecSys-Datasets , and then transform data from hdf5 format into numpy format. thx.

I got the .h5 file of Avazu from Ads-RecSys-Datasets can you provide a script to convert it to .npy file?

TiaoziLiao commented 3 years ago

We follow the data preprocess progress in Ads-RecSys-Datasets , and then transform data from hdf5 format into numpy format. thx.

My file structure is as follows -- Avazu |-- hdf | |-- test_input_part_0.h5 | |-- test_input_part_1.h5 | |-- test_input_part_2.h5 | |-- test_input_part_3.h5 | |-- test_input_part_4.h5 | |-- test_output_part_0.h5 | |-- test_output_part_1.h5 | |-- test_output_part_2.h5 | |-- test_output_part_3.h5 | |-- test_output_part_4.h5 | |-- train_input_part_0.h5 | |-- train_input_part_1.h5 | |-- train_input_part_10.h5 | |-- train_input_part_11.h5 | |-- train_input_part_5.h5 | |-- train_input_part_6.h5 | |-- train_input_part_7.h5 | |-- train_input_part_8.h5 | |-- train_input_part_9.h5 | |-- train_output_part_0.h5 | |-- train_output_part_11.h5 | |-- train_output_part_12.h5 | |-- train_output_part_13.h5 | |-- train_output_part_14.h5 | |-- train_output_part_15.h5 | |-- train_output_part_16.h5 | |-- train_output_part_2.h5 | |-- train_output_part_3.h5 | |-- train_output_part_4.h5 | |-- train_output_part_8.h5 | -- train_output_part_9.h5 -- npy

kinglai commented 3 years ago
data = pd.read_hdf(f_in_name).as_matrix()
np.save(np_data_path, data)

Run it for all your h5 data files.

We follow the data preprocess progress in Ads-RecSys-Datasets , and then transform data from hdf5 format into numpy format. thx.

My file structure is as follows -- Avazu |-- hdf | |-- test_input_part_0.h5 | |-- test_input_part_1.h5 | |-- test_input_part_2.h5 | |-- test_input_part_3.h5 | |-- test_input_part_4.h5 | |-- test_output_part_0.h5 | |-- test_output_part_1.h5 | |-- test_output_part_2.h5 | |-- test_output_part_3.h5 | |-- test_output_part_4.h5 | |-- train_input_part_0.h5 | |-- train_input_part_1.h5 | |-- train_input_part_10.h5 | |-- train_input_part_11.h5 | |-- train_input_part_5.h5 | |-- train_input_part_6.h5 | |-- train_input_part_7.h5 | |-- train_input_part_8.h5 | |-- train_input_part_9.h5 | |-- train_output_part_0.h5 | |-- train_output_part_11.h5 | |-- train_output_part_12.h5 | |-- train_output_part_13.h5 | |-- train_output_part_14.h5 | |-- train_output_part_15.h5 | |-- train_output_part_16.h5 | |-- train_output_part_2.h5 | |-- train_output_part_3.h5 | |-- train_output_part_4.h5 | |-- train_output_part_8.h5 | -- train_output_part_9.h5-- npy

TiaoziLiao commented 3 years ago
data = pd.read_hdf(f_in_name).as_matrix()
np.save(np_data_path, data)

Run it for all your h5 data files.

We follow the data preprocess progress in Ads-RecSys-Datasets , and then transform data from hdf5 format into numpy format. thx.

My file structure is as follows -- Avazu |-- hdf | |-- test_input_part_0.h5 | |-- test_input_part_1.h5 | |-- test_input_part_2.h5 | |-- test_input_part_3.h5 | |-- test_input_part_4.h5 | |-- test_output_part_0.h5 | |-- test_output_part_1.h5 | |-- test_output_part_2.h5 | |-- test_output_part_3.h5 | |-- test_output_part_4.h5 | |-- train_input_part_0.h5 | |-- train_input_part_1.h5 | |-- train_input_part_10.h5 | |-- train_input_part_11.h5 | |-- train_input_part_5.h5 | |-- train_input_part_6.h5 | |-- train_input_part_7.h5 | |-- train_input_part_8.h5 | |-- train_input_part_9.h5 | |-- train_output_part_0.h5 | |-- train_output_part_11.h5 | |-- train_output_part_12.h5 | |-- train_output_part_13.h5 | |-- train_output_part_14.h5 | |-- train_output_part_15.h5 | |-- train_output_part_16.h5 | |-- train_output_part_2.h5 | |-- train_output_part_3.h5 | |-- train_output_part_4.h5 | |-- train_output_part_8.h5 | -- train_output_part_9.h5-- npy

thank for your replay, i will try in this way