Closed JCU777 closed 2 years ago
Sorry for not giving you a good answer. Can you provide environmental information?
conda env export > my_env.yml
Or you may try creating a virtual environment without .yml option.
I recommend that you first install tf
and others without faiss-gpu
and try training. It's the safer option.
I had the same problem when I ran it on RTX3090 with batchsize=640
But don't know why, it seems to work suddenly,My configuration is CUDA11.0, CUDNN 8.0, tensorflow2.4, I hope it can help you.
But don't know why, it seems to work suddenly,My configuration is CUDA11.0, CUDNN 8.0, tensorflow2.4, I hope it can help you.
But the current results show that it is abnormally high, the first epoch reaches 83%, I don't know why.Because I've been busy lately, I can't figure out the reason for the time being.
@Novicei
But the current results show that it is abnormally high, the first epoch reaches 83%, I don't know why.Because I've been busy lately, I can't figure out the reason for the time being.
Is the 83% for validation or actual test? It's totally normal in validation accuracy for 1s input, because the validation set consists of a few hundred 30s songs database. Also, this is not related to the issue #18.
Is for validation.I followed the 620_lamb file you posted for training.But I look at the accuracy rate you posted, the first epoch starts at 65%. #15
@Novicei It is possible that the validation accuracy is low on the first epoch with a larger batch size. However, on the 100th (or more) epoch, bsz=640 will get better validation accuracy than bsz=120. The slow training of bsz=640 suggests that the new scheduler for learning rate and temperature will be useful. This topic has not been discussed further in the paper.
@mimbres I don't understand what you mean by that. I mean, I used your 640 configuration file for the experiment, but the verification accuracy of the output is 83% for the first epoch 1s, 100% for the 5s and above, and the accuracy for the seventh epoch 1 second is about 93% % This should not match your results in #15, I don't know what's wrong, maybe it's my environment.
@Novicei Sorry, I misunderstood your question last time.
As you mentioned, the mini-test validation accuracy (~94%) of current repo with the 640 configuration is higher than that I reported in #15 (~83%). Let me share a new 640_lamb_ep400
result.
It is also noticeable that the val_loss scale is different from #15.
To explain, we can get different scale of val_loss
and val_acc
depending on the selection of validation set we use in the mini-search-validation(). The value of max_n_samples
and the size of validation set has changed since performing the pre-release experiment mentioned in #15. 'max_n_samples' is currently set to 3,000 for quick validation. You can set up to 25,000 as needed.
Thanks for reporting this!
I followed the actions in the readme documentation to configure the environment(Create a virtual environment via .yml), and downloaded the Dataset-mini v1.1 to ../ . But the loss calculated when running run.py training is nan. When debugging, I found that when the data passed through the front_conv layer of the FingerPrinter model, the values of the calculated tensor were all 0 or nan. What‘s wrong and why is this happening?