The article reported a prediction accuracy of 100% for the ford-A dataset, while the V2S_main.py script yielded only 93%.

hello lijipu1, thanks for reporting this evaluation issue.

Yes, the actual audio transformer model under tf.2 and kapre had a slight difference on the model we released there. I used to run a hyperparameter search on the dropout rate over the prompt noise. This is another difference also. Due to the version conflict issue in tf.2, it is now harder to run under keras.

If you are finding the number for future research, please feel free to refer to this issue on the 93 to 96% acc in Ford A with current codebase. A better V2S result could be attained by using AST based PyTorch backend [1]. Sorry for the confusion.

See a later version speech reprogramming / wav-form prompting code here https://github.com/biboamy/music-repro

see this issue also on 94% acc https://github.com/huckiyang/Voice2Series-Reprogramming/issues/2

huckiyang / Voice2Series-Reprogramming

The article reported a prediction accuracy of 100% for the ford-A dataset, while the V2S_main.py script yielded only 93%. #4