Open xdBronch opened 11 months ago
Workaround: use the more powerful zig build-exe
CLI and you can pass -O3
directly with
-cflags [flags] -- Set extra flags for the next positional C source files
not sure if im doing something incorrectly but zig build-exe run.c -cflags -Ofast -lm -march=native -- -lc
runs at 10 tokens/s, adding -OReleaseFast
after the cflags brings it back up to 82
There's an environment variable to print the clang command: ZIG_VERBOSE_CC=1
Edit: your positional argument is before the flags rather than after
Set extra flags for the next positional C source files
ah yeah thats my bad. putting the C file after the flags runs at 40 tokens/s then adding -OReleaseFast
runs at ~415. this is a obviously a massive improvement but still a bit behind. im also a little confused as to why i need to use C and Zig optimization flags to get good results?
You should start by comparing the clang command lines, and make sure the clang versions are the same.
found that this can be worked around fairly easily (although maybe a bit hacky) by just telling zig to append the optimization level to clang's argv if its Ofast
.
maybe instead of a work around like this, zig itself gains flags that match the behavior of Ofast
? we have @setFloatMode
on the zig side but theres nothing to enable it globally, i think adding something like that would both fix this problem and speed up existing zig code, sounds like a win-win imo. please lmk if theres a reason this cant be an option.
ah yeah thats my bad. putting the C file after the flags runs at 40 tokens/s then adding
-OReleaseFast
runs at ~415. this is a obviously a massive improvement but still a bit behind. im also a little confused as to why i need to use C and Zig optimization flags to get good results?
try -Xclang -Ofast
Zig Version
0.12.0-dev.3+9c05810be
Steps to Reproduce and Observed Behavior
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
thenmake runfast
and./run stories15M.bin
. This by default will use GCC to compile but can be overridden in the makefile, here are some data points on my computerAll of these are being passed
-Ofast
from therunfast
config. If you instead usemake run
it uses-O3
which causes GCC and Clang to drop to the same level as Zig CC, ~85 or so tokens/s. This seems to indicate the Zig CC is not properly accepting-Ofast
and instead is defaulting to a lower level of optimization.I asked about this in the zig discord and was told that for zig cc "-O3 is translated into ReleaseFast, which is O2". If this is the case I believe it's valid when interacting directly with Zig code if theres a good reason for it but zig cc is often advertised as a better way to compile C, a drop in replacement. I understand that the majority of programs will not benefit this much from
Ofast
, some may even slow down, but for it to be an actual replacement it needs to be able to at least match the competition.Expected Behavior
zig cc should make code as fast as other compilers