huckiyang / Voice2Series-Reprogramming

ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification
Apache License 2.0
66 stars 10 forks source link
deep-learning machine-learning speech-processing time-series transfer-learning

Voice2Series-Reprogramming

Voice2Series: Reprogramming / Prompting Acoustic Models for Time Series Classification

Environment

Keras TensorFlow

Tensorflow 2.2 (CUDA=10.0) and Kapre 0.2.0.

conda env create -f V2S.yml
pip install tensorflow-gpu==2.1.0
pip install kapre==0.2.0
pip install h5py==2.10.0
pip install pyts

Training

Please also check the paper for actual validation details. Many Thanks!

python v2s_main.py --dataset 0 --eps 20 --mod 2 --seg 18 --mapping 1
Epoch 14/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.4493 - accuracy: 0.9239 - val_loss: 0.4571 - val_accuracy: 0.9106
Epoch 15/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.4297 - accuracy: 0.9306 - val_loss: 0.4381 - val_accuracy: 0.9265
Epoch 16/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.4182 - accuracy: 0.9247 - val_loss: 0.4204 - val_accuracy: 0.9205
Epoch 17/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.3972 - accuracy: 0.9320 - val_loss: 0.4072 - val_accuracy: 0.9242
Epoch 18/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.3905 - accuracy: 0.9303 - val_loss: 0.4099 - val_accuracy: 0.9242
Epoch 19/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.3765 - accuracy: 0.9320 - val_loss: 0.3924 - val_accuracy: 0.9258
Epoch 20/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.3704 - accuracy: 0.9300 - val_loss: 0.3816 - val_accuracy: 0.9250
--- Train loss: 0.36046191089949786
- Train accuracy: 0.93113023
--- Test loss: 0.38329164963780027
- Test accuracy: 0.925
=== Best Val. Acc:  0.92651516  At Epoch of  14
python v2s_main.py --dataset 0 --eps 20 --mod 2 --seg 18 --mapping 18
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.8762 - accuracy: 0.9231 - val_loss: 0.8479 - val_accuracy: 0.9182
Epoch 12/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.8360 - accuracy: 0.9236 - val_loss: 0.8191 - val_accuracy: 0.9152
Epoch 13/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.7920 - accuracy: 0.9242 - val_loss: 0.7693 - val_accuracy: 0.9273
Epoch 14/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.7586 - accuracy: 0.9228 - val_loss: 0.7358 - val_accuracy: 0.9235
Epoch 15/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.7265 - accuracy: 0.9270 - val_loss: 0.7076 - val_accuracy: 0.9205
Epoch 16/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.6980 - accuracy: 0.9247 - val_loss: 0.6707 - val_accuracy: 0.9295
Epoch 17/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.6650 - accuracy: 0.9281 - val_loss: 0.6473 - val_accuracy: 0.9250
Epoch 18/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.6444 - accuracy: 0.9286 - val_loss: 0.6270 - val_accuracy: 0.9303
Epoch 19/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.6194 - accuracy: 0.9286 - val_loss: 0.6020 - val_accuracy: 0.9318
Epoch 20/20
3601/3601 [==============================] - 4s 1ms/sample - loss: 0.5964 - accuracy: 0.9275 - val_loss: 0.5813 - val_accuracy: 0.9227
--- Train loss: 0.5795955053139845
- Train accuracy: 0.93113023
--- Test loss: 0.5856682072986256
- Test accuracy: 0.92651516
=== Best Val. Acc:  0.9318182  At Epoch of  18

Class Activation Mapping

python cam_v2s.py --dataset 5 --weight wNo5_map6-88-0.7662.h5 --mapping 6 --layer conv2d_1

Theoretical Discussion

FAQ

I would recommend using different label mapping numbers for training. For instance, you could use --mapping 7 for ECG 5000 dataset. The dropout rate is also an important hyperparameter for tuning the testing loss. You could use a range between 0.2 to 0.5 with --dr 4 for 0.4 dropout rate.

V2S mask is provided as an option, but the training script is not using the masking for forwarding passing. From our experiments, using or not using the masking only has small variants on the performance. This is not in conflict with the proposed theoretical analysis on learning target domain adaption.

Yes, you are welcome. Please send an email to the author for potential collaberation.

Pre-trained models and training

cd weight
pip install gdown
gdown https://drive.google.com/uc?id=1mhqXZ8CANgHyepum7N4yrjiyIg6qaMe6

Additional Questions

Please open an issue here for discussion. Thank you!