Which config can reproduce the results in paper?

kkoutini / PaSST

Efficient Training of Audio Transformers with Patchout

Apache License 2.0

295 stars 50 forks source link

Open diggerdu opened 6 months ago

kkoutini commented 6 months ago

Hi, For training on Audioset see this section The following parameters control how many time/frequency frames in patchout: Unstructured:

models.net.u_patchout=400

Structured:

models.net.s_patchout_f=4 
models.net.s_patchout_t=400

For fine-tuning check the folder of each dataset, for example fsd50k