Closed sergeykorablin closed 1 year ago
oh no what did I do. I don't feel like I made any crazy changes and things work on my Linux box and Macbook just fine. Looking...
Update: valgrind seems happy too. How strange.
@sergeykorablin Can you please try
make rundebug
valgrind --leak-check=full ./run out/model.bin -n 5
What do you mean by "with new trained models"?
I just pulled the changes, built, and it is running OK
I dont see any changes that should seg. buuuut. I'm assuming mac from your repos and I'm checking on Win so probably meaningless.
@sergeykorablin Can you please try
make rundebug valgrind --leak-check=full ./run out/model.bin -n 5
➜ llama2.c git:(master) ✗ make rundebug
gcc -g -o run run.c -lm
➜ llama2.c git:(master) ✗ valgrind --leak-check=full ./run out/model.bin -n 5
==127945== Memcheck, a memory error detector
==127945== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==127945== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==127945== Command: ./run out/model.bin -n 5
==127945==
<s>
==127945== Invalid read of size 4
==127945== at 0x401BCB: matmul (run.c:202)
==127945== by 0x40251B: transformer (run.c:308)
==127945== by 0x403310: main (run.c:549)
==127945== Address 0xaea9000 is in a rwx anonymous segment
==127945==
voy Ulrich(` enters Pont
achieved tok/s: 0.925069
==127945==
==127945== HEAP SUMMARY:
==127945== in use at exit: 0 bytes in 0 blocks
==127945== total heap usage: 32,019 allocs, 32,019 frees, 32,251,897 bytes allocated
==127945==
==127945== All heap blocks were freed -- no leaks are possible
==127945==
==127945== For lists of detected and suppressed errors, rerun with: -s
==127945== ERROR SUMMARY: 80 errors from 1 contexts (suppressed: 0 from 0)
What do you mean by "with new trained models"? I just pulled the changes, built, and it is running OK
i have few model i have trained a day ago and downloaded stories110M.bin.. - they work fine models trained from scratch right now - all cause segfault
Most likely seems to be an issue with your custom weight .bin file. Was the .bin file saved correctly or is it corrupted?
i reinstalled llama2.c and python venv and now it works without problem... strange
you scared me! :)
After last git pull && make - ./run crashes with new trained models and works fine with old ones