Closed utkarsh-tyagi closed 2 years ago
Thank you for your question. Could you please let me know the shape of the input feature you feed into the model? I am not sure if you preprocess
the raw audio properly.
In this repo, we provided the MATLAB code for preprocessing. If you prefer to use Python, please refer to preprocess.py
in our new repo for the recent work https://github.com/yzyouzhang/Empirical-Channel-CM. Thanks.
torch.Size([60, 750])
Please find my notebook file what i am doing.
How can i connect with you? email id? notebook_air_2.zip
@yzyouzhang Also i am not able to train the model from scratch , it give me error: Batch size = 2 GPU : RTX 2060 6 GB
RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 4.49 GiB already allocated; 0 bytes free; 4.52 GiB reserved in total by PyTorch)
torch.Size([60, 750])
Please find my notebook file what i am doing.
How can i connect with you? email id? notebook_air_2.zip
The feature size should be [B, 1, 60, 750].
I checked your code. You have ResNet(1, 16)
when you set up the model. It should be ResNet(3, 256)
if you want to use my pretrained model.
@yzyouzhang Also i am not able to train the model from scratch , it give me error: Batch size = 2 GPU : RTX 2060 6 GB
RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 4.49 GiB already allocated; 0 bytes free; 4.52 GiB reserved in total by PyTorch)
In my case, I use RTX 1080 Ti 11GB. I can load with batch size 64. So I think batch size 2 should be totally OK for your device. Have you made sure there are no other processes occupying the GPU memory?
@yzyouzhang how can i convert my feature size (60,750) to [B, 1, 60, 750].
what is B and 1 means?
@yzyouzhang i checked it again , no other process occupying my gpu memory.
could you please share your id or zoom id for further contact to solve this issue?
@yzyouzhang what threshold score you have set for real and fake samples? score = F.softmax(lfcc_outputs)[:, 0] score
@yzyouzhang how can i convert my feature size (60,750) to [B, 1, 60, 750].
what is B and 1 means?
B is the batch size, 1 is the number of channels for CNN.
@yzyouzhang what threshold score you have set for real and fake samples? score = F.softmax(lfcc_outputs)[:, 0] score
We do not need a threshold to calculate EER. If you want to classify samples into two classes, you can choose a value between the r1 and r2 of the OCSoftmax.
@yzyouzhang i checked it again , no other process occupying my gpu memory.
could you please share your id or zoom id for further contact to solve this issue?
Please contact yzyouzhang@gmail.com for further zoom discussions. Thanks.
ok thanks :)
I have applied your pretrained model on some audio files but it give me error :
RuntimeError: Calculated padded input size per channel: (3 x 752). Kernel size: (9 x 3). Kernel size can't be greater than actual input size
Do you know what is the reason?