yzyouzhang / AIR-ASVspoof

Official implementation of the SPL paper "One-class Learning Towards Synthetic Voice Spoofing Detection"
MIT License
103 stars 28 forks source link

Error while applying your model #24

Closed utkarsh-tyagi closed 2 years ago

utkarsh-tyagi commented 2 years ago

I have applied your pretrained model on some audio files but it give me error :

RuntimeError: Calculated padded input size per channel: (3 x 752). Kernel size: (9 x 3). Kernel size can't be greater than actual input size

Do you know what is the reason?

yzyouzhang commented 2 years ago

Thank you for your question. Could you please let me know the shape of the input feature you feed into the model? I am not sure if you preprocess the raw audio properly.

yzyouzhang commented 2 years ago

In this repo, we provided the MATLAB code for preprocessing. If you prefer to use Python, please refer to preprocess.py in our new repo for the recent work https://github.com/yzyouzhang/Empirical-Channel-CM. Thanks.

utkarsh-tyagi commented 2 years ago

torch.Size([60, 750])

Please find my notebook file what i am doing.

How can i connect with you? email id? notebook_air_2.zip

utkarsh-tyagi commented 2 years ago

@yzyouzhang Also i am not able to train the model from scratch , it give me error: Batch size = 2 GPU : RTX 2060 6 GB

RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 4.49 GiB already allocated; 0 bytes free; 4.52 GiB reserved in total by PyTorch)

yzyouzhang commented 2 years ago

torch.Size([60, 750])

Please find my notebook file what i am doing.

How can i connect with you? email id? notebook_air_2.zip

The feature size should be [B, 1, 60, 750]. I checked your code. You have ResNet(1, 16) when you set up the model. It should be ResNet(3, 256) if you want to use my pretrained model.

yzyouzhang commented 2 years ago

@yzyouzhang Also i am not able to train the model from scratch , it give me error: Batch size = 2 GPU : RTX 2060 6 GB

RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 4.49 GiB already allocated; 0 bytes free; 4.52 GiB reserved in total by PyTorch)

In my case, I use RTX 1080 Ti 11GB. I can load with batch size 64. So I think batch size 2 should be totally OK for your device. Have you made sure there are no other processes occupying the GPU memory?

utkarsh-tyagi commented 2 years ago

@yzyouzhang how can i convert my feature size (60,750) to [B, 1, 60, 750].

what is B and 1 means?

utkarsh-tyagi commented 2 years ago

@yzyouzhang i checked it again , no other process occupying my gpu memory.

could you please share your id or zoom id for further contact to solve this issue?

utkarsh-tyagi commented 2 years ago

@yzyouzhang what threshold score you have set for real and fake samples? score = F.softmax(lfcc_outputs)[:, 0] score

yzyouzhang commented 2 years ago

@yzyouzhang how can i convert my feature size (60,750) to [B, 1, 60, 750].

what is B and 1 means?

B is the batch size, 1 is the number of channels for CNN.

yzyouzhang commented 2 years ago

@yzyouzhang what threshold score you have set for real and fake samples? score = F.softmax(lfcc_outputs)[:, 0] score

We do not need a threshold to calculate EER. If you want to classify samples into two classes, you can choose a value between the r1 and r2 of the OCSoftmax.

yzyouzhang commented 2 years ago

@yzyouzhang i checked it again , no other process occupying my gpu memory.

could you please share your id or zoom id for further contact to solve this issue?

Please contact yzyouzhang@gmail.com for further zoom discussions. Thanks.

utkarsh-tyagi commented 2 years ago

ok thanks :)