Open MekkCyber opened 2 months ago
Hi @MekkCyber , Yeah, when bitblas encounters a kernel configuration for the first time, it performs the compilation and stores the result in a database, which is located by default at ~/.cache/bitblas
. The next time it encounters the same configuration, it retrieves the precompiled library directly from the database, bypassing the tuning process.
As a result, tuning only occurs the first time a specific model and its initial layer are encountered :)
We’re also considering bypassing tuning by saving compilation results for different hardware setups, but this is challenging and may take some time to design and implement though :)
Thanks a lot @LeiWang1999 much clearer now
Hello @LeiWang1999
I am trying to use the BitNet modeling in an other project to use bitblas kernels, when I load the model, and try to replace linear layers, with BitBlas Linear layers, the
_get_or_create_bitblas_operator
function takes a lot of time to execute and compile kernels based on the weight shape, for a model with 32 layers, with a hidden size of 4096 and intermediate size of 14336 it takes ~8 min. Is this an intended behaviour ? Thank you for your help