microsoft / CLAP

Learning audio concepts from natural language supervision
MIT License
455 stars 35 forks source link

Distinct default sampling rate in CLAP(version='2023') #38

Open kamilakesbi opened 1 month ago

kamilakesbi commented 1 month ago

The CLAP model version 2023 process audios sampled at 44100 Hz according to this configuration file).

However, when we initialise the model, the htsat module still uses this other config file to define some of its parameters, as we can see here. In this config file, the sampling rate is set to 32000. As a result, the LogmelFilterBank is initialised with a sampling rate of 32000.

Is this behaviour expected ?

Thanks!