hikettei / Caten

[wip] Deep Learning Compiler based on Polyhedral Compiler and Light-weight IRs based on Optimizing Pattern Matcher
https://hikettei.github.io/Caten/
Other
20 stars 4 forks source link

Excessive Operations in AST Output with SBCL 2.4.10 #161

Open abourramouss opened 2 weeks ago

abourramouss commented 2 weeks ago

As we discussed, sbcl 2.4.0 generates 17 ops for the following snippet:

(caten (!matmul (make-tensor `(a b)) (make-tensor `(b c))))

while the number of ops generated are many more using sbcl 2.4.10

list of ops created

hikettei commented 2 weeks ago

can you still repro this in the latest master?

This is my result (master+sbcl/2.4.0)

TEST> (setf (ctx:getenv :JIT) 1)
TEST> (caten (!matmul (make-tensor `(a b)) (make-tensor `(b c))))

#S(AVM
   :GRAPH 
Graph[seen=NIL, outputs=NIL] {
    <Node[BUFFER] ALLOCATE(NID21664407) : val_25 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID21664409) : val_26 <- (val_25) where :value=C>
    <Node[BUFFER] ALLOCATE(NID21664399) : val_27 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID21664401) : val_28 <- (val_27) where :value=C>
    <Node[BUFFER] ALLOCATE(NID21664395) : val_29 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID21664397) : val_30 <- (val_29) where :value=B>
    <ALLOCATE : val_31 <- (shape=(val_30, val_28), stride=(val_26, 1)) where :nrank=2 :dtype=FLOAT32 :_type_relay=NIL :_read_views=NIL :_output_type=NIL>
    <Node[BUFFER] ALLOCATE(NID21664388) : val_40 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID21664390) : val_41 <- (val_40) where :value=B>
    <Node[BUFFER] ALLOCATE(NID21664376) : val_42 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID21664378) : val_43 <- (val_42) where :value=B>
    <Node[BUFFER] ALLOCATE(NID21664372) : val_44 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID21664374) : val_45 <- (val_44) where :value=A>
    <ALLOCATE : val_46 <- (shape=(val_45, val_43), stride=(val_41, 1)) where :nrank=2 :dtype=FLOAT32 :_type_relay=NIL :_read_views=NIL :_output_type=NIL>
    <ALLOCATE : val_84 <- (shape=(A, C, 1), stride=(C, 1, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
    <Node[JIT] JIT_KERNEL(NID21676823) : val_85 <- (val_84, B, C, A, val_46, val_31) where :output-buffer-n=1 :kernel-info=<CLANG[FUSED_SUMNODE_MATMUL21670761]> :dtypes=(FLOAT32
                                                                                      INT32
                                                                                      INT32
                                                                                      INT32
                                                                                      FLOAT32
                                                                                      FLOAT32)>
    <Node[SPECIAL/VM] PAUSE/BACKWARD(NID21670678) : val_86 <- (val_85)>
}

   :NAME :MAIN21664370
   :FW-OUTPUTS (|val_86|)
   :BW-OUTPUTS NIL
   :ID2TENSOR #<HASH-TABLE :TEST EQL :COUNT 1 {70B80CD6E3}>
   :TAPE-LENGTH 17
   :PC 0
   :VARIABLES #<HASH-TABLE :TEST EQL :COUNT 0 {70B9056A53}>
   :DUMPED NIL)
TEST> 
abourramouss commented 2 weeks ago

Current output:

   :GRAPH 
Graph[seen=NIL, outputs=(STC331_1)] {
    <Node[BUFFER] ALLOCATE(NID1988) : SID1987 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1990) : LID1989 <- (SID1987) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1980) : SID1979 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1982) : LID1981 <- (SID1979) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1976) : SID1975 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1978) : LID1977 <- (SID1975) where :value=A>
    <Node[BUFFER] ALLOCATE(NID2656) : SID2655 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2658) : LID2657 <- (SID2655) where :value=C>
    <Node[BUFFER] ALLOCATE(NID2640) : SID2639 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2642) : LID2641 <- (SID2639) where :value=C>
    <Node[BUFFER] ALLOCATE(NID2636) : SID2635 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2638) : LID2637 <- (SID2635) where :value=A>
    <Node[BUFFER] ALLOCATE(NID1874) : SID1873 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1876) : LID1875 <- (SID1873) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1604) : SID1603 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1606) : LID1605 <- (SID1603) where :value=B>
    <Node[BUFFER] ALLOCATE(NID1608) : SID1607 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1610) : LID1609 <- (SID1607) where :value=C>
    <Node[BINARYOPS] MUL(NID1666) : BID1665 <- (LID1609, LID1605)>
    <Node[BUFFER] ALLOCATE(NID1858) : SID1857 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1860) : LID1859 <- (SID1857) where :value=B>
    <Node[BUFFER] ALLOCATE(NID1854) : SID1853 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1856) : LID1855 <- (SID1853) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1850) : SID1849 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1852) : LID1851 <- (SID1849) where :value=A>
    <Node[BUFFER] ALLOCATE(NID365) : SID364 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID367) : LID366 <- (SID364) where :value=C>
    <Node[BUFFER] ALLOCATE(NID357) : SID356 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID359) : LID358 <- (SID356) where :value=C>
    <Node[BUFFER] ALLOCATE(NID353) : SID352 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID355) : LID354 <- (SID352) where :value=B>
    <ALLOCATE : TID330 <- (shape=(LID354, LID358), stride=(LID366, 1)) where :nrank=2 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
    <VIEW : TID1848 <- (TID330, shape=(LID1851, LID1855, LID1859), views=((0, LID1851, 1, T), (0, LID1855, 1, NIL), (0, LID1859, 1, NIL)), stride=(BID1665, 1, LID1875), permute=(0 2 1))>
    <Node[BUFFER] ALLOCATE(NID1482) : SID1481 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1484) : LID1483 <- (SID1481) where :value=B>
    <Node[BUFFER] ALLOCATE(NID1466) : SID1465 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1468) : LID1467 <- (SID1465) where :value=B>
    <Node[BUFFER] ALLOCATE(NID1462) : SID1461 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1464) : LID1463 <- (SID1461) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1458) : SID1457 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1460) : LID1459 <- (SID1457) where :value=A>
    <Node[BUFFER] ALLOCATE(NID346) : SID345 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID348) : LID347 <- (SID345) where :value=B>
    <Node[BUFFER] ALLOCATE(NID338) : SID337 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID340) : LID339 <- (SID337) where :value=B>
    <Node[BUFFER] ALLOCATE(NID334) : SID333 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID336) : LID335 <- (SID333) where :value=A>
    <ALLOCATE : TID329 <- (shape=(LID335, LID339), stride=(LID347, 1)) where :nrank=2 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
    <VIEW : TID1456 <- (TID329, shape=(LID1459, LID1463, LID1467), views=((0, LID1459, 1, NIL), (0, LID1463, 1, T), (0, LID1467, 1, NIL)), stride=(LID1483, LID1483, 1), permute=(0 1 2))>
    <Node[BUFFER] ALLOCATE(NID1314) : SID1313 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1316) : LID1315 <- (SID1313) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1308) : SID1307 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1310) : LID1309 <- (SID1307) where :value=B>
    <Node[BINARYOPS] MUL(NID1318) : BID1317 <- (LID1309, LID1315)>
    <Node[BUFFER] ALLOCATE(NID1300) : SID1299 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID1302) : LID1301 <- (SID1299) where :value=B>
    <Node[BUFFER] ALLOCATE(NID1296) : SID1295 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID1298) : LID1297 <- (SID1295) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1292) : SID1291 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID1294) : LID1293 <- (SID1291) where :value=A>
    <ALLOCATE : TID699 <- (shape=(LID1293, LID1297, LID1301), stride=(BID1317, LID1309, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
    <Node[BINARYOPS] MOVE(NID1577) : BID1576 <- (TID699, TID1456)>
    <Node[BINARYOPS] MUL(NID1970) : BID1969 <- (BID1576, TID1848)>
    <Node[BUFFER] ALLOCATE(NID2531) : SID2530 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2533) : LID2532 <- (SID2530) where :value=C>
    <Node[BUFFER] ALLOCATE(NID2523) : SID2522 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2525) : LID2524 <- (SID2522) where :value=B>
    <Node[BUFFER] ALLOCATE(NID2519) : SID2518 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2521) : LID2520 <- (SID2518) where :value=C>
    <Node[BUFFER] ALLOCATE(NID2515) : SID2514 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2517) : LID2516 <- (SID2514) where :value=A>
    <Node[BUFFER] ALLOCATE(NID2505) : SID2504 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID2507) : LID2506 <- (SID2504) where :value=C>
    <Node[BUFFER] ALLOCATE(NID2487) : SID2486 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID2489) : LID2488 <- (SID2486) where :value=C>
    <Node[BUFFER] ALLOCATE(NID2483) : SID2482 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID2485) : LID2484 <- (SID2482) where :value=A>
    <ALLOCATE : TID2106 <- (shape=(LID2484, LID2488, 1), stride=(LID2506, 1, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
    <Node[BUFFER] LOAD(NID2512) : LID2511 <- (TID2106) where :value=0.0>
    <VIEW : TID2513 <- (LID2511, shape=(LID2516, LID2520, LID2524), views=((0, LID2516, 1, NIL), (0, LID2520, 1, NIL), (0, LID2524, 1, T)), stride=(LID2532, 1, 1), permute=(0 1 2))>
    <Node[BINARYOPS] ADD(NID2633) : BID2632 <- (TID2513, BID1969) where :reduction=T>
    <VIEW : VID1066 <- (BID2632, shape=(LID2637, LID2641, 1), views=((0, LID2637, 1, NIL), (0, LID2641, 1, NIL), (0, 1, 1, T)), stride=(LID2657, 1, 1), permute=(0 1 2))>
    <Node[BUFFER] ALLOCATE(NID1285) : SID1284 <- () where :nrank=0 :dtype=INT32>
    <Node[BUFFER] LOAD(NID1287) : LID1286 <- (SID1284) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1267) : SID1266 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID1269) : LID1268 <- (SID1266) where :value=C>
    <Node[BUFFER] ALLOCATE(NID1263) : SID1262 <- () where :nrank=0 :dtype=UINT32>
    <Node[BUFFER] LOAD(NID1265) : LID1264 <- (SID1262) where :value=A>
    <ALLOCATE : TID1259 <- (shape=(LID1264, LID1268, 1), stride=(LID1286, 1, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
    <Node[BINARYOPS] MOVE(NID1973) : BID1972 <- (TID1259, VID1066)>
    <VIEW : STC331 <- (BID1972, shape=(LID1977, LID1981), views=((0, LID1977, 1, NIL), (0, LID1981, 1, NIL)), stride=(LID1989, 1), permute=(0 1))>
    <Node[SPECIAL/VM] PAUSE/BACKWARD(NID6667) : STC331_1 <- (STC331)>
}

   :NAME :MAIN332
   :FW-OUTPUTS (STC331_1)
   :BW-OUTPUTS NIL
   :ID2TENSOR #<HASH-TABLE :TEST EQL :COUNT 1 {1001A62A93}>
   :TAPE-LENGTH 92
   :PC 0
   :VARIABLES #<HASH-TABLE :TEST EQL :COUNT 2 {10046CBD63}>
   :DUMPED NIL) 
CL-USER>