Closed hedongyan closed 1 year ago
First, download and unpack the data as described here. You will see the new data/
directory in the repository. In the directory, there are datasets used in the paper.
Then, you have to add your dataset in the data/
directory following the format of other datasets. Let's say your dataset's name is iris
. Then you should use np.save and create the directory data/iris
with the following content:
X_num_train.npy
, X_num_val.npy
, X_num_test.npy
(numpy arrays of float32)X_cat_train.npy
, X_cat_train.npy
, X_cat_train.npy
(numpy arrays of strings)y_train.npy
, y_val.npy
, y_test.npy
(numpy arrays of {float32 for regression, int64 for classification}); for classification, the classes must be from range(n_classes)
info.json
: see this file for other datasets to see its contentLet's say you want to run the tuning & evaluation pipeline for MLP on your dataset. Then copy any existing config (for example, this one) and change the path inside the config to point to your dataset ("data/iris"
instead of "data/california"
).
Full script:
export CUDA_VISIBLE_DEVICES="0"
mkdir exp/mlp/iris
cp exp/mlp/california/0_tuning.toml exp/mlp/iris/0_tuning.toml
<edit the new config as described above>
python bin/tune.py exp/mlp/iris/0_tuning.toml
python bin/evaluate.py exp/mlp/iris/0_tuning 15
python bin/ensemble.py exp/mlp/iris/0_evaluation
I don't understand the question :) You can use bin/train4.py
as a starting point.
Feel free to reopen the issue if needed.
Should I change the dataset into a csv file or excel file or other formats? Which lines or files should I change if I want to use a new dataset and a new embedding algorithms for evaluation while keeping the awesome hyper-parameter tuning mechanisms?