rafaelvalle / asrgen

Attacking Speaker Recognition with Deep Generative Models
https://arxiv.org/pdf/1801.02384.pdf
34 stars 6 forks source link

Generate target samples #4

Closed Fansgithub2019 closed 5 years ago

Fansgithub2019 commented 5 years ago

Thank you for your contribution, I have some doubts in the experiment, I hope you can answer. First question: In gan_synthesis.ipynb

audio = load_wav_to_torch('data_16khz/zcathy/cathy.wav', SAMPLING_RATE)
audio /= MAX_WAV_VALUE
audio = audio[None, :]
reference_mel = taco_stft.mel_spectrogram(audio)[0]
print(reference_mel.min(), reference_mel.max())

mel -= mel.min()
mel = mel / mel.max()
mel = mel * reference_mel.max()
print(mel.min(), mel.max())**

Is *mel = mel reference_mel.max()** the matching of the generated fake audio with the real audio? I don't quite understand how to use the trained G_NET to generate the voiceprint audio that matches the target.

Second question: Is gan_attack.ipynb a target attack? The target ID you set is 0. Can this be modified and replaced with another ID?

Looking forward to your reply!

rafaelvalle commented 5 years ago

1a) That scales the target mel-spectrogram to the target mel-spectrogram. 1b) samples = G_net(noise) generates fake samples.

2)Yes, it is a targeted attack.

rafaelvalle commented 5 years ago

Closing due to inactivity.