Closed sbersier closed 1 year ago
PS: Good job for a "noob"! :-)
NOTE: I modified the code in order to split my dataset (4422 files) according to 0.85/0.10/0.5 (train/val/test). So, there were much more files in the validation and test datasets (respectively 467 and 263 files) than just 2 files in each. RESULT: No change! At all! It looks like 2 files for validation are enough. It still looks a bit weird to me: what if there are no sibilants in the validation files? How can the network know that it is doing well with sibilants? Mystery...
It's possible that having just two files for validation is enough because those files may be representative of the overall distribution of sibilant sounds in your dataset. However, if you're still concerned about this, you can try increasing the number of validation files and see if it has any impact on your results. Additionally, you could try stratifying your validation dataset by ensuring that it contains a representative sample of sibilant and non-sibilant sounds, which may help in evaluating the performance of your model on sibilant sounds.
I guess you're right. Now, since it didn't make the slightest difference, I don't think I will investigate furthermore in that direction. I'll keep to the original version. Question: How many iterations would you recommend in the training phase? I think that 15'000 is enough. I trained once for 96'000 iterations and didn't see a real improvement. Am I missing something?
The number of iterations required during the training phase may depend on various factors such as the size and complexity of your model, size and quality of your dataset, computational resources, etc. It's true that increasing the number of iterations may not necessarily lead to a better performance of your model, and it's possible that the model's performance may plateau beyond a certain number of iterations.
In general, it's a good idea to monitor the performance of your model during the training phase and stop training when the performance no longer improves significantly. You can do this by regularly evaluating your model on the validation set or by monitoring the loss function on the training set.
Regarding your specific situation, if you have already trained your model for 96,000 iterations and did not see a significant improvement in performance, then 15,000 iterations may be sufficient. However, I would still recommend gradually increasing the number of iterations and keeping track of the performance to determine the optimal number of iterations for your specific case.
The above response is by ChatGPT. I am very sorry if I have confused you.
???? How could this happen ??? It is marked "Owner". Weird....
As many low level issues started to being post in this repo, I decided to try chatgpt-github-app
, and according to its docs this is the only way it works. As other users have been very unhappy with it, I have already stopped the bot. I am really sorry about that.
@allcontributors add sbersier idea, userTesting
@34j
I've put up a pull request to add @sbersier! :tada:
The validating set doesn't actually effect how the model trains at all, the model is set into evaluation mode and gradients are turned off while processing the validation set. It's likely mainly to see how it is progressing by comparing conversion results on a consistent set of files. Adding more files to it will only slow down training and bloat the tensorboard logs.
It would be nice to get some additional statistics on validation, like an l1 loss over the mel spectrograms, or something more sophisticated.
The test set is completely unused at the moment, should probably be removed seeing as it's also unused in the original repository as well. https://github.com/svc-develop-team/so-vits-svc/commit/2854013a8a480e5437eeb63af65569b7567e2c36
First, I would like to say that the results are very, very, very impressive! Congrats!
Nevertheless, I have three questions:
1) I noticed in preprocess_flist_config.py that the splitting between train/valid/test files is done according to: (lines 51-53)
which means only two files are taken to make the validation dataset and the test dataset.
For the test files, that's OK (even if it wouldn't hurt to have a bit more). But, is it OK to have only two validation files? Something like a 80%/10%/10% split for train/val/test seems more "conventional". Again, the number of files in the test dataset is not very important. But having, like 10% of the total data looks pretty much normal to me. So, why so few validation files? I ask this because I have the feeling, that once the network is satisfied with the two (!) validation files it selected, it has no real incentive to make any further progress. So, would it be a good thing to split the dataset according to a fix ratio (let's say: 0.85/0.10/0.05) ?
2) Wouldn't it be better to shuffle the paths before making the split?
3) Wouldn't it be a good idea to add a Frechet audio distance estimation in the logs in order to have a better estimation of the convergence (for example like in: (https://github.com/gudgud96/frechet-audio-distance)) ? I'm not saying that Frechet distance should be used to train - it would probably slow down the whole thing, but estimating convergence in a generative-adversarial network is always difficult and a perceptual estimate of the distance between original and generated sample would be nice.
Anyhow, again, congrats! Very good job!
Best regards