Hello. I have a dataset in a folder with songs and textual descriptions for them like this (split into 5 seconds of audio):
ls ./dataset
1.wav
1.txt
2.wav
2.txt
Can you please show me how to fine-tune your model on my dataset for 512x512 images/MELs (with code/notebook, if possible) in Google Colab on A100 or L4?
Hello. I have a dataset in a folder with songs and textual descriptions for them like this (split into 5 seconds of audio):
Can you please show me how to fine-tune your model on my dataset for 512x512 images/MELs (with code/notebook, if possible) in Google Colab on A100 or L4?