Currently, the following 2 single-layer MLP have worst performance compared with GC v1.
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
Currently, the following 2 single-layer MLP have worst performance compared with GC v1. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
dtype | batch size | hidden list | GC V1 | 8c55a0544 remove brgemm read lock -- | -- | -- | -- | -- bf16 | 128 | 1024x1024 | 0.0286 | 0.0828 | 34.52% bf16 | 128 | 1024x512 | 0.0204 | 0.0670 | 30.45%