svc-develop-team / so-vits-svc

SoftVC VITS Singing Voice Conversion
GNU Affero General Public License v3.0
25.83k stars 4.82k forks source link

[mps] issue with Apple silicon compatibility #170

Open magic-akari opened 1 year ago

magic-akari commented 1 year ago

OS version

Darwin arm64

GPU

mps

Python version

Python 3.8.16

PyTorch version

2.0.0

Branch of sovits

4.0(Default)

Dataset source (Used to judge the dataset quality)

N/A

Where thr problem occurs or what command you executed

inference

Situation description

Tips:

issues:

related codes:

https://github.com/svc-develop-team/so-vits-svc/blob/0298cd448fc699732e29d8951c96bc02dcc347ce/vdecoder/nsf_hifigan/models.py#L144-L146

https://github.com/svc-develop-team/so-vits-svc/blob/0298cd448fc699732e29d8951c96bc02dcc347ce/vdecoder/nsf_hifigan/models.py#L159-L162

There are some double type casts in the source code. Is it required?

Some methods related to double are not implemented in mps devices.

I think float is enough, but I am not sure. I have modified and tested locally, and it works well.

Is there a significant loss of precision in moving the torch.cumsum operation from double to float?

CC: @ylzz1997

Log

N/A

Supplementary description

No response

ylzz1997 commented 1 year ago

In theory, it doesn't matter. F0 normalized to 0-1 should be accurate to 1e-7, flaot just right.

But the cumsum operation after the float type converted to double is written by the person who wrote NFS_HIFIGAN. What exactly is the effect that can be raised issue under the NFS_HIFIGAN project.