otabuzzman / llm.java

A Java port of Andrej Karpathy‘s llm.c.
MIT License
0 stars 0 forks source link

Single task graph implementation slow and error-prone #3

Open otabuzzman opened 4 hours ago

otabuzzman commented 4 hours ago

Run on PTX device (requires workarounds for issues #1 and #2):

python %TORNADO_SDK%\bin\tornado ^
--jvm="-Dgpt2.device=2:0 -DUseVectorAPI=true -Dtornado.device.memory=2GB" ^
--classpath bin com.otabuzzman.llmj.TestGpt2

Output:

WARNING: Using incubator modules: jdk.incubator.vector
[GPT-2]
max_seq_len: 1024
vocab_size: 50257
padded_vocab_size: 50304
num_layers: 12
num_heads:12
channels: 768
num_parameters: 124475904
[State]
batch_size: 4
seq_len: 64
num_activations: 73347840
forward pass took 106669 ms
initial matmul_backward took 16948 ms
backward pass took 64786 ms
-43,431618, -28,848349
MISMATCH AT INDEX 0,0: -43,431618 -28,848349
NOT OK (LOGITS), max_diff = 1,458327e+01
LOSS MISMATCH: 8,492410 5,270009
dwte
OK 0,000082 -0,002320
OK -0,000768 0,002072
OK 0,000537 0,003717
OK -0,001315 0,001307
OK 0,000722 0,000632
TENSOR NOT OK, maxdiff = 1,144282e+01
dwpe
OK 0,003658 -0,005110
OK -0,001113 -0,000012
OK -0,002855 -0,003262
OK 0,002991 0,009909
OK 0,000941 0,002145
TENSOR NOT OK, maxdiff = 8,601452e-01
dln1w
NOT OK 0,061486 -0,007523
NOT OK -0,036861 0,008643
NOT OK -0,072082 0,005029
NOT OK -0,096351 -0,011095
OK -0,011581 -0,001664
TENSOR NOT OK, maxdiff = 3,810957e+00
dln1b
NOT OK -0,119165 -0,038458
NOT OK -0,128096 -0,030600
NOT OK -0,164230 0,010223
NOT OK -0,156378 0,080176
NOT OK -0,026975 -0,060901
TENSOR NOT OK, maxdiff = 9,030868e-01
dqkvw
OK -0,000410 -0,000031
OK 0,000800 -0,000025
OK -0,001813 -0,000064
OK -0,000403 0,000074
OK -0,000395 0,000020
TENSOR NOT OK, maxdiff = 2,433470e-01
dqkvb
OK -0,011257 -0,000411
OK -0,011257 -0,000412
OK -0,011257 0,000113
OK -0,011257 -0,000565
OK -0,011257 0,000570
TENSOR NOT OK, maxdiff = 1,249138e-01
dattprojw
OK -0,001489 0,000080
OK -0,001489 -0,000005
OK -0,001489 -0,000019
OK -0,001489 0,000004
OK -0,001489 0,000031
TENSOR NOT OK, maxdiff = 8,986928e-02
dattprojb
OK -0,000000 0,000470
OK 0,000000 -0,009979
OK 0,000000 -0,001804
NOT OK 0,000000 0,037584
NOT OK -0,000000 -0,031239
TENSOR NOT OK, maxdiff = 7,141188e-02
dln2w
OK -0,000000 -0,018312
OK 0,000000 0,004813
OK 0,000000 0,008091
OK 0,000000 -0,001470
OK 0,000000 -0,002737
TENSOR NOT OK, maxdiff = 1,230198e+00
dln2b
NOT OK 0,000000 -0,026368
OK -0,000000 -0,016695
OK 0,000000 0,001074
NOT OK -0,000000 0,034711
NOT OK 0,000000 -0,028584
TENSOR NOT OK, maxdiff = 1,768748e-01
dfcw
OK -0,000000 0,000440
OK -0,000000 -0,000000
OK 0,000000 -0,000154
OK -0,000000 -0,000165
OK -0,000000 0,000405
TENSOR NOT OK, maxdiff = 1,670877e-01
dfcb
OK 0,000000 0,003293
OK 0,000000 0,002043
OK 0,000000 -0,001386
OK 0,000000 0,000386
OK 0,000000 0,001604
TENSOR NOT OK, maxdiff = 8,318771e-02
dfcprojw
OK -0,002369 0,000681
OK -0,002369 0,000073
OK -0,002369 -0,000416
OK -0,002369 -0,000061
OK -0,002369 -0,000604
TENSOR NOT OK, maxdiff = 9,027594e-02
dfcprojb
OK -0,000000 0,003584
OK 0,000000 -0,007158
OK 0,000000 -0,001964
OK 0,000000 0,001462
OK -0,000000 0,001217
TENSOR NOT OK, maxdiff = 3,119996e-02
dlnfw
OK 0,000081 -0,000022
OK -0,000513 0,000811
OK 0,001299 0,001161
OK 0,000605 -0,002957
OK 0,000211 0,001145
TENSOR NOT OK, maxdiff = 3,421277e+00
dlnfb
OK -0,005951 -0,011101
OK 0,014036 0,008007
OK -0,022385 -0,004769
OK 0,005362 -0,002113
OK -0,010341 -0,005905
TENSOR NOT OK, maxdiff = 2,150902e-01
step 0: loss 8,492410 (took 172866 ms) OK = false
...