Closed njellinas closed 5 years ago
Non-parallel means that during training the contexts of the two voice inputs do not have to be the same. For example, input 1 could be "How are you doing?" and input 2 could be "Fine, thanks." Most GANs use non-parallel data during training. Although I loaded all the sentences, during training, the voice inputs were randomly chosen and clipped from original voice sources so they are not aligned. It is nothing wrong to use more data as long as you do not use validation/test data during training. Feel free to modify the settings to see if this makes a difference or not.
It's just that they mention explicitly:
we divided training set into two subsets without overlap. The first half sentences were used for the source and the other 81 sentences were used for the target
Of course with your setting the sentences are shuffled, but I guess in such a large number of iterations, you could assume some overlap. Also one of their strong points is that the model works under this limited data condition. I may try to change this and give you my feedback when I have trained the model.
Sure. You can use the first 81 from person A and the second 81 from person B, or the second 81 from person A and the first 81 from person B, so that they will not have overlap. However, I don't remember if I have a mechanism preventing using the parallel data. Say if I randomly chose voice file 10 from person A, if I can randomly choose from voice files except file 10 from person B, there will be no chance to overlap at all. You may try to modify this if my code does not have it. In that way you can make use of all the training data.
Yes, I think it can be done this way also. I just wanted to replicate the paper exactly and see the quality of the results.
The training was fine, and the results were good even with half the data. So its all OK!
The training was fine, and the results were good even with half the data. So its all OK!
Good to know.
In the paper, the authors state that the first 81 sentences are used for source and the remaining 81 sentences for target. In your dataloading you load all of the sentences, so this results in a different result than the original paper. I guess that's why the results are so good. You should try implementing this in the non-parallel setting and check the results again.