Closed lifeiteng closed 2 years ago
Hi @lifeiteng
Thanks a lot for reporting this. Do you mind sharing reproduction instructions? i.e., Is it just the changes to OpenBLAS ones that you mentioned? I want to benchmark it on our servers.
Please keep in mind that FbgemmF16 is designed to save on the BW used by loading of pre-packed B matrix (weight matrix during inference). Computations still happen in fp32 after converting fp16 values to fp32 in the inner kernel. Also, FbgemmF16 at the moment is tuned for server class CPUs (think bigger caches).
Thanks Daya
Hi @lifeiteng
Thanks a lot for reporting this. Do you mind sharing reproduction instructions? i.e., Is it just the changes to OpenBLAS ones that you mentioned? I want to benchmark it on our servers.
Please keep in mind that FbgemmF16 is designed to save on the BW used by loading of pre-packed B matrix (weight matrix during inference). Computations still happen in fp32 after converting fp16 values to fp32 in the inner kernel. Also, FbgemmF16 at the moment is tuned for server class CPUs (think bigger caches).
Thanks Daya
yes, just comment out two lines as mentioned.
Closing due to no recent activity.
Environment:
Change to OpenBLAS:
pre pack
it like FbgemmTest results(cblas_sgemm no transpose):