viznut / vzgpt

Viznut's C-only GPT-2 implementation
49 stars 1 forks source link

Compile woes & stability issues #1

Open FlyingFathead opened 3 years ago

FlyingFathead commented 3 years ago

Ahoy there!

First of all, congratulations on the interesting concept.

Second of all, well, where do I start? I must apologize in advance that C is not my native language, hence I have no clue where to start from with some of the compile warnings that I got...

The Makefile could use make-over, pardon the pun. Unless the -pthread link flag is manually added into the Makefile, the compile will fail on all the Linux-based setups that I tried to compile vzgpt on.

In other words, the following happens if the -pthread flag is not added into the Makefile (a.k.a. by default):

clang -O2 main.o tokens.o glyphgen.o model.o ui_sdl.o ui_tty.o -o vzgpt `sdl-config --libs --cflags` -lSDL_image -lm
/usr/bin/ld: model.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:6: vzgpt] Error 1

Had the same problem with a Linux that was running glibc 2.3.3.

Just a heads up to anyone actually attempting to compile and run this, you need to link '-pthread' at the end of vzgpt's parameters inside the Makefile for the thing to compile on default settings in the first place, like so:

vzgpt: main.o tokens.o glyphgen.o model.o ui_sdl.o ui_tty.o
        $(CC) -O2 main.o tokens.o glyphgen.o model.o ui_sdl.o ui_tty.o -o vzgpt `sdl-config --libs --cflags` -lSDL_image -lm -pthread

(Notice the -pthread flag at the end of the line. Add that to the Makefile on that line, save and make.)

Speaking of make, here are the warnings I got during compile on a non-AVX/AVX2 x86/64 system with -pthread flag enabled (Ubuntu clang version 12.0.0-3ubuntu1~21.04.1 / Target: x86_64-pc-linux-gnu / Thread model: posix).

Any idea on these?

clang -O2 -c main.c -D__MAIN__
main.c:60:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
main.c:65:15: warning: implicit declaration of function 'tokenize' is invalid in C99 [-Wimplicit-function-declaration]
  int answtok=tokenize(answer);
              ^
main.c:66:12: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
  int here=tokenize_to_context(prompt,0);
           ^
main.c:181:8: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
  here=tokenize_to_context(prompt,0);
       ^
main.c:250:6: warning: implicit declaration of function 'loadtokens' is invalid in C99 [-Wimplicit-function-declaration]
  rc=loadtokens(modelpath);
     ^
main.c:254:8: warning: implicit declaration of function 'loadpalette' is invalid in C99 [-Wimplicit-function-declaration]
    rc=loadpalette(modelpath);
       ^
main.c:258:16: warning: implicit declaration of function 'tokenize' is invalid in C99 [-Wimplicit-function-declaration]
    emptytoken=tokenize("<|endoftext|>");
               ^
main.c:265:3: warning: implicit declaration of function 'loadmodel' is invalid in C99 [-Wimplicit-function-declaration]
  loadmodel(modelpath);
  ^
main.c:363:3: warning: implicit declaration of function 'ttyui' is invalid in C99 [-Wimplicit-function-declaration]
  ttyui();
  ^
main.c:401:22: warning: more '%' conversions than data arguments [-Wformat-insufficient-args]
            "Usage: %s <options> [modelpath]\n"
                    ~^
main.c:466:31: warning: implicit declaration of function 'ui_init' is invalid in C99 [-Wimplicit-function-declaration]
  if(wannastartui || !prompt) ui_init();
                              ^
main.c:472:24: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
  if(prompt) promptlgt=tokenize_to_context(prompt,0);
                       ^
main.c:478:5: warning: implicit declaration of function 'ui_run' is invalid in C99 [-Wimplicit-function-declaration]
    ui_run();
    ^
main.c:483:17: warning: incompatible function pointer types passing 'int (int)' to parameter of type '__sighandler_t' (aka 'void (*)(int)') [-Wincompatible-function-pointer-types]
  signal(SIGINT,handlesignal);
                ^~~~~~~~~~~~
/usr/include/signal.h:88:57: note: passing argument to parameter '__handler' here
extern __sighandler_t signal (int __sig, __sighandler_t __handler)
                                                        ^
14 warnings generated.
clang -O3 -c tokens.c
tokens.c:103:10: warning: incompatible pointer types returning 'wte_t **' (aka 'short **') from a function with result type 'wte_t *' (aka 'short *'); dereference with * [-Wincompatible-pointer-types]
  return userwte+WVSIZE*(token-nummodeltokens);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         *(                                   )
tokens.c:111:10: warning: incompatible pointer types returning 'wte_t **' (aka 'short **') from a function with result type 'wte_t *' (aka 'short *'); dereference with * [-Wincompatible-pointer-types]
  return userwte+WVSIZE*(token-nummodeltokens);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         *(                                   )
2 warnings generated.
clang -O3 -c glyphgen.c
clang -O3 -funsafe-math-optimizations -c model.c
model.c:66:7: warning: assigning to 'wte_t *' (aka 'short *') from 'pkdflt *' (aka 'unsigned short *') converts between pointers to integer types with different sign [-Wpointer-sign]
  wtet=(pkdflt*)readfile("wtet.raw",&sz,path); // igpt-only
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
model.c:67:6: warning: assigning to 'wte_t *' (aka 'short *') from 'pkdflt *' (aka 'unsigned short *') converts between pointers to integer types with different sign [-Wpointer-sign]
  sos=(pkdflt*)readfile("sos.raw",&sz,path); // igpt-only
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
model.c:122:30: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
          layers[i].mlp_cfc_w=(float*)readfile(fn,&sz,path);
                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
model.c:126:32: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
          layers[i].mlp_cproj_w=(float*)readfile(fn,&sz,path);
                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
model.c:134:33: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
          layers[i].attn_cproj_w=(float*)readfile(fn,&sz,path);
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
model.c:139:35: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
            layers[i].attn_cproj_w=(float*)readfile(fn,&sz,path);
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
model.c:159:33: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
          layers[i].attn_cattn_w=(float*)readfile(fn,&sz,path);
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
model.c:185:19: warning: incompatible pointer types passing 'pkdflt *' (aka 'unsigned short *') to parameter of type 'float *' [-Wincompatible-pointer-types]
        transpose(layers[i].attn_cattn_w,WVSIZE*3,WVSIZE);
                  ^~~~~~~~~~~~~~~~~~~~~~
model.c:5:23: note: passing argument to parameter 'm' here
float*transpose(float*m,int w,int h)
                      ^
model.c:184:30: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
      layers[i].attn_cattn_w =
                             ^
model.c:188:19: warning: incompatible pointer types passing 'pkdflt *' (aka 'unsigned short *') to parameter of type 'float *' [-Wincompatible-pointer-types]
        transpose(layers[i].attn_cproj_w,WVSIZE,WVSIZE);
                  ^~~~~~~~~~~~~~~~~~~~~~
model.c:5:23: note: passing argument to parameter 'm' here
float*transpose(float*m,int w,int h)
                      ^
model.c:187:30: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
      layers[i].attn_cproj_w =
                             ^
model.c:191:17: warning: incompatible pointer types passing 'pkdflt *' (aka 'unsigned short *') to parameter of type 'float *' [-Wincompatible-pointer-types]
      transpose(layers[i].mlp_cfc_w,WVSIZE*4,WVSIZE);
                ^~~~~~~~~~~~~~~~~~~
model.c:5:23: note: passing argument to parameter 'm' here
float*transpose(float*m,int w,int h)
                      ^
model.c:190:25: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
    layers[i].mlp_cfc_w =
                        ^
model.c:193:17: warning: incompatible pointer types passing 'pkdflt *' (aka 'unsigned short *') to parameter of type 'float *' [-Wincompatible-pointer-types]
      transpose(layers[i].mlp_cproj_w,WVSIZE,WVSIZE*4);
                ^~~~~~~~~~~~~~~~~~~~~
model.c:5:23: note: passing argument to parameter 'm' here
float*transpose(float*m,int w,int h)
                      ^
model.c:192:27: warning: incompatible pointer types assigning to 'pkdflt *' (aka 'unsigned short *') from 'float *' [-Wincompatible-pointer-types]
    layers[i].mlp_cproj_w =
                          ^
model.c:361:24: warning: passing 'volatile pthread_barrier_t *' to parameter of type 'pthread_barrier_t *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  pthread_barrier_wait(&thrglob.barrier);
                       ^~~~~~~~~~~~~~~~
/usr/include/pthread.h:1123:53: note: passing argument to parameter '__barrier' here
extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
                                                    ^
model.c:407:46: warning: data argument not used by format string [-Wformat-extra-args]
  if(verbose>=3) fprintf(stderr,"heads...\n",layeridx);
                                ~~~~~~~~~~~~ ^
model.c:446:48: warning: data argument not used by format string [-Wformat-extra-args]
  if(verbose>=3) fprintf(stderr,"project...\n",layeridx);
                                ~~~~~~~~~~~~~~ ^
model.c:466:44: warning: data argument not used by format string [-Wformat-extra-args]
  if(verbose>=3) fprintf(stderr,"mlp...\n",layeridx);
                                ~~~~~~~~~~ ^
model.c:551:26: warning: passing 'volatile pthread_barrier_t *' to parameter of type 'pthread_barrier_t *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    pthread_barrier_init(&thrglob.barrier,NULL,numthreads);
                         ^~~~~~~~~~~~~~~~
/usr/include/pthread.h:1113:64: note: passing argument to parameter '__barrier' here
extern int pthread_barrier_init (pthread_barrier_t *__restrict __barrier,
                                                               ^
model.c:555:22: warning: passing 'volatile pthread_t *' (aka 'volatile unsigned long *') to parameter of type 'pthread_t *' (aka 'unsigned long *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
      pthread_create(&thrglob.t[i],NULL,perthread,&thread_args[i]);
                     ^~~~~~~~~~~~~
/usr/include/pthread.h:200:50: note: passing argument to parameter '__newthread' here
extern int pthread_create (pthread_t *__restrict __newthread,
                                                 ^
model.c:558:29: warning: passing 'volatile pthread_barrier_t *' to parameter of type 'pthread_barrier_t *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    pthread_barrier_destroy(&thrglob.barrier);
                            ^~~~~~~~~~~~~~~~
/usr/include/pthread.h:1119:56: note: passing argument to parameter '__barrier' here
extern int pthread_barrier_destroy (pthread_barrier_t *__barrier)
                                                       ^
model.c:582:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
23 warnings generated.
clang -O2 -c ui_sdl.c
ui_sdl.c:303:7: warning: implicit declaration of function 'renderwordvec_pkd' is invalid in C99 [-Wimplicit-function-declaration]
      renderwordvec_pkd(getwv(context[i]),x,y,zoom);
      ^
ui_sdl.c:357:36: warning: incompatible pointer types passing 'wte_t *' (aka 'short *') to parameter of type 'float *' [-Wincompatible-pointer-types]
    if(!smallchange) renderwordvec(getwv(t),x0+6,y,32);
                                   ^~~~~~~~
./common.h:160:26: note: passing argument to parameter 'wv0' here
void renderwordvec(float*wv0,int x0,int y0,int dim);
                         ^
ui_sdl.c:373:16: warning: passing 'unsigned char [1024]' to parameter of type 'const char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not [-Wpointer-sign]
  int s=strlen(userinput);
               ^~~~~~~~~
/usr/include/string.h:391:35: note: passing argument to parameter '__s' here
extern size_t strlen (const char *__s)
                                  ^
ui_sdl.c:381:16: warning: passing 'unsigned char [1024]' to parameter of type 'const char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not [-Wpointer-sign]
  int s=strlen(userinput)-1;
               ^~~~~~~~~
/usr/include/string.h:391:35: note: passing argument to parameter '__s' here
extern size_t strlen (const char *__s)
                                  ^
ui_sdl.c:421:14: warning: passing 'unsigned char [1024]' to parameter of type 'char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not [-Wpointer-sign]
  rendertext(userinput,0,8+16+2,0xffffff,0x333333,3);
             ^~~~~~~~~
ui_sdl.c:86:22: note: passing argument to parameter 's' here
void rendertext(char*s,int x,int y,int fg,int bg,int flags)
                     ^
ui_sdl.c:459:10: warning: initializing 'pkdflt *' (aka 'unsigned short *') with an expression of type 'wte_t *' (aka 'short *') converts between pointers to integer types with different sign [-Wpointer-sign]
  pkdflt*src=getwv(token);
         ^   ~~~~~~~~~~~~
ui_sdl.c:476:20: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    fprintf(stderr,tokenstrings[context[i]]);
                   ^~~~~~~~~~~~~~~~~~~~~~~~
ui_sdl.c:476:20: note: treat the string as an argument to avoid this
    fprintf(stderr,tokenstrings[context[i]]);
                   ^
                   "%s", 
ui_sdl.c:644:9: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
        tokenize_to_context(userinput,cursor_slot);
        ^
ui_sdl.c:646:9: warning: implicit declaration of function 'nametoken' is invalid in C99 [-Wimplicit-function-declaration]
        nametoken(context[cursor_slot],userinput);
        ^
ui_sdl.c:719:18: warning: implicit declaration of function 'allocusertoken' is invalid in C99 [-Wimplicit-function-declaration]
        else tok=allocusertoken(currwv,NULL);
                 ^
10 warnings generated.
clang -O2 -c ui_tty.c
ui_tty.c:71:18: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
        currslot=tokenize_to_context(addition,currslot+1)-1;
                 ^
ui_tty.c:79:16: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
      currslot=tokenize_to_context(addition,currslot+1)-1;
               ^
ui_tty.c:86:18: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
        currslot=tokenize_to_context(addition,currslot+1)-1;
                 ^
ui_tty.c:94:26: warning: implicit declaration of function 'tokenize_to_context' is invalid in C99 [-Wimplicit-function-declaration]
  if(*addition) currslot=tokenize_to_context(addition,currslot+1)-1;
                         ^
4 warnings generated.
clang -O2 main.o tokens.o glyphgen.o model.o ui_sdl.o ui_tty.o -o vzgpt `sdl-config --libs --cflags` -lSDL_image -lm -pthread

The build did finish without an error though, which brings me to my penultimate question... How do I get vzgpt to not to crash on startup? :-D

The checkpoint conversion with dumpckpt.py from my pre-existing GPT-2 model checkpoint goes OK, but upon launching the program with the data set, I end up getting a segfault+coredump during the SDL UI startup after everything else has been loaded up successfully.

(On that test setup, I'm using Ubuntu 21.04's default desktop WM, GNOME 3.38.4.)

During startup and after loading up the translated model successfully, there's a brief flash where the program tries to draw a GUI window and the program crashes at that exact point. This is what I got in the verbose mode:

load font...
fetched file font.dat, lgt=196608
start ui
Segmentation fault (core dumped)

There doesn't appear to be a command line switch to force it to run inside terminal? (I understood that I should edit the config.h to disable SDL, any hints on how should I do that?)

I've tried running the compiled program in userspace modes that are both GPU-enabled (CUDA v11.2) and CPU-only (i.e. different Conda environments for TensorFlow).

I would assume it has something to do with the (SDL-related?) compile warnings (my LibSDL is 2.0.14). Tensorflow itself isn't the most stable contraption either tbh, as you might have noticed if you've worked around with TF/PyTorch based ML/NN stuff. ;)

All help kindly appreciated, and I'm sorry about my silly inquiries in advance -- I hope you can forgive my lack of knowledge on C when it comes to this.

I wish you a nice summer (and Halloween, and Christmas, in case you don't read this by then!) All the best with your unique work. :o)

viznut commented 3 years ago

Sorry for a late response! You may perhaps want to try disabling the thread support altogether (edit config.h and either undef or comment out #define HAVE_THREADS). You may even try to disable the SDL support there.

FlyingFathead commented 3 years ago

Sorry for a late response! You may perhaps want to try disabling the thread support altogether (edit config.h and either undef or comment out #define HAVE_THREADS). You may even try to disable the SDL support there.

Thanks a lot for the answer; no worries for it being late, there's no such thing on this field :o)

Is the threading feature btw in any way related to Tensorflow's AVX/AVX2 (Advanced Vector Extensions) utilization? That might indeed be a problem in some of the endpoints I've tried it out on. I'll try the tip you gave and see how it goes. Thank you once more. Good luck.