Open abourramouss opened 2 weeks ago
can you still repro this in the latest master?
This is my result (master+sbcl/2.4.0)
TEST> (setf (ctx:getenv :JIT) 1)
TEST> (caten (!matmul (make-tensor `(a b)) (make-tensor `(b c))))
#S(AVM
:GRAPH
Graph[seen=NIL, outputs=NIL] {
<Node[BUFFER] ALLOCATE(NID21664407) : val_25 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID21664409) : val_26 <- (val_25) where :value=C>
<Node[BUFFER] ALLOCATE(NID21664399) : val_27 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID21664401) : val_28 <- (val_27) where :value=C>
<Node[BUFFER] ALLOCATE(NID21664395) : val_29 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID21664397) : val_30 <- (val_29) where :value=B>
<ALLOCATE : val_31 <- (shape=(val_30, val_28), stride=(val_26, 1)) where :nrank=2 :dtype=FLOAT32 :_type_relay=NIL :_read_views=NIL :_output_type=NIL>
<Node[BUFFER] ALLOCATE(NID21664388) : val_40 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID21664390) : val_41 <- (val_40) where :value=B>
<Node[BUFFER] ALLOCATE(NID21664376) : val_42 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID21664378) : val_43 <- (val_42) where :value=B>
<Node[BUFFER] ALLOCATE(NID21664372) : val_44 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID21664374) : val_45 <- (val_44) where :value=A>
<ALLOCATE : val_46 <- (shape=(val_45, val_43), stride=(val_41, 1)) where :nrank=2 :dtype=FLOAT32 :_type_relay=NIL :_read_views=NIL :_output_type=NIL>
<ALLOCATE : val_84 <- (shape=(A, C, 1), stride=(C, 1, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
<Node[JIT] JIT_KERNEL(NID21676823) : val_85 <- (val_84, B, C, A, val_46, val_31) where :output-buffer-n=1 :kernel-info=<CLANG[FUSED_SUMNODE_MATMUL21670761]> :dtypes=(FLOAT32
INT32
INT32
INT32
FLOAT32
FLOAT32)>
<Node[SPECIAL/VM] PAUSE/BACKWARD(NID21670678) : val_86 <- (val_85)>
}
:NAME :MAIN21664370
:FW-OUTPUTS (|val_86|)
:BW-OUTPUTS NIL
:ID2TENSOR #<HASH-TABLE :TEST EQL :COUNT 1 {70B80CD6E3}>
:TAPE-LENGTH 17
:PC 0
:VARIABLES #<HASH-TABLE :TEST EQL :COUNT 0 {70B9056A53}>
:DUMPED NIL)
TEST>
Current output:
:GRAPH
Graph[seen=NIL, outputs=(STC331_1)] {
<Node[BUFFER] ALLOCATE(NID1988) : SID1987 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1990) : LID1989 <- (SID1987) where :value=C>
<Node[BUFFER] ALLOCATE(NID1980) : SID1979 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1982) : LID1981 <- (SID1979) where :value=C>
<Node[BUFFER] ALLOCATE(NID1976) : SID1975 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1978) : LID1977 <- (SID1975) where :value=A>
<Node[BUFFER] ALLOCATE(NID2656) : SID2655 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2658) : LID2657 <- (SID2655) where :value=C>
<Node[BUFFER] ALLOCATE(NID2640) : SID2639 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2642) : LID2641 <- (SID2639) where :value=C>
<Node[BUFFER] ALLOCATE(NID2636) : SID2635 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2638) : LID2637 <- (SID2635) where :value=A>
<Node[BUFFER] ALLOCATE(NID1874) : SID1873 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1876) : LID1875 <- (SID1873) where :value=C>
<Node[BUFFER] ALLOCATE(NID1604) : SID1603 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1606) : LID1605 <- (SID1603) where :value=B>
<Node[BUFFER] ALLOCATE(NID1608) : SID1607 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1610) : LID1609 <- (SID1607) where :value=C>
<Node[BINARYOPS] MUL(NID1666) : BID1665 <- (LID1609, LID1605)>
<Node[BUFFER] ALLOCATE(NID1858) : SID1857 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1860) : LID1859 <- (SID1857) where :value=B>
<Node[BUFFER] ALLOCATE(NID1854) : SID1853 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1856) : LID1855 <- (SID1853) where :value=C>
<Node[BUFFER] ALLOCATE(NID1850) : SID1849 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1852) : LID1851 <- (SID1849) where :value=A>
<Node[BUFFER] ALLOCATE(NID365) : SID364 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID367) : LID366 <- (SID364) where :value=C>
<Node[BUFFER] ALLOCATE(NID357) : SID356 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID359) : LID358 <- (SID356) where :value=C>
<Node[BUFFER] ALLOCATE(NID353) : SID352 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID355) : LID354 <- (SID352) where :value=B>
<ALLOCATE : TID330 <- (shape=(LID354, LID358), stride=(LID366, 1)) where :nrank=2 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
<VIEW : TID1848 <- (TID330, shape=(LID1851, LID1855, LID1859), views=((0, LID1851, 1, T), (0, LID1855, 1, NIL), (0, LID1859, 1, NIL)), stride=(BID1665, 1, LID1875), permute=(0 2 1))>
<Node[BUFFER] ALLOCATE(NID1482) : SID1481 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1484) : LID1483 <- (SID1481) where :value=B>
<Node[BUFFER] ALLOCATE(NID1466) : SID1465 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1468) : LID1467 <- (SID1465) where :value=B>
<Node[BUFFER] ALLOCATE(NID1462) : SID1461 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1464) : LID1463 <- (SID1461) where :value=C>
<Node[BUFFER] ALLOCATE(NID1458) : SID1457 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1460) : LID1459 <- (SID1457) where :value=A>
<Node[BUFFER] ALLOCATE(NID346) : SID345 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID348) : LID347 <- (SID345) where :value=B>
<Node[BUFFER] ALLOCATE(NID338) : SID337 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID340) : LID339 <- (SID337) where :value=B>
<Node[BUFFER] ALLOCATE(NID334) : SID333 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID336) : LID335 <- (SID333) where :value=A>
<ALLOCATE : TID329 <- (shape=(LID335, LID339), stride=(LID347, 1)) where :nrank=2 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
<VIEW : TID1456 <- (TID329, shape=(LID1459, LID1463, LID1467), views=((0, LID1459, 1, NIL), (0, LID1463, 1, T), (0, LID1467, 1, NIL)), stride=(LID1483, LID1483, 1), permute=(0 1 2))>
<Node[BUFFER] ALLOCATE(NID1314) : SID1313 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1316) : LID1315 <- (SID1313) where :value=C>
<Node[BUFFER] ALLOCATE(NID1308) : SID1307 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1310) : LID1309 <- (SID1307) where :value=B>
<Node[BINARYOPS] MUL(NID1318) : BID1317 <- (LID1309, LID1315)>
<Node[BUFFER] ALLOCATE(NID1300) : SID1299 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID1302) : LID1301 <- (SID1299) where :value=B>
<Node[BUFFER] ALLOCATE(NID1296) : SID1295 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID1298) : LID1297 <- (SID1295) where :value=C>
<Node[BUFFER] ALLOCATE(NID1292) : SID1291 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID1294) : LID1293 <- (SID1291) where :value=A>
<ALLOCATE : TID699 <- (shape=(LID1293, LID1297, LID1301), stride=(BID1317, LID1309, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
<Node[BINARYOPS] MOVE(NID1577) : BID1576 <- (TID699, TID1456)>
<Node[BINARYOPS] MUL(NID1970) : BID1969 <- (BID1576, TID1848)>
<Node[BUFFER] ALLOCATE(NID2531) : SID2530 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2533) : LID2532 <- (SID2530) where :value=C>
<Node[BUFFER] ALLOCATE(NID2523) : SID2522 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2525) : LID2524 <- (SID2522) where :value=B>
<Node[BUFFER] ALLOCATE(NID2519) : SID2518 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2521) : LID2520 <- (SID2518) where :value=C>
<Node[BUFFER] ALLOCATE(NID2515) : SID2514 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2517) : LID2516 <- (SID2514) where :value=A>
<Node[BUFFER] ALLOCATE(NID2505) : SID2504 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID2507) : LID2506 <- (SID2504) where :value=C>
<Node[BUFFER] ALLOCATE(NID2487) : SID2486 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID2489) : LID2488 <- (SID2486) where :value=C>
<Node[BUFFER] ALLOCATE(NID2483) : SID2482 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID2485) : LID2484 <- (SID2482) where :value=A>
<ALLOCATE : TID2106 <- (shape=(LID2484, LID2488, 1), stride=(LID2506, 1, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
<Node[BUFFER] LOAD(NID2512) : LID2511 <- (TID2106) where :value=0.0>
<VIEW : TID2513 <- (LID2511, shape=(LID2516, LID2520, LID2524), views=((0, LID2516, 1, NIL), (0, LID2520, 1, NIL), (0, LID2524, 1, T)), stride=(LID2532, 1, 1), permute=(0 1 2))>
<Node[BINARYOPS] ADD(NID2633) : BID2632 <- (TID2513, BID1969) where :reduction=T>
<VIEW : VID1066 <- (BID2632, shape=(LID2637, LID2641, 1), views=((0, LID2637, 1, NIL), (0, LID2641, 1, NIL), (0, 1, 1, T)), stride=(LID2657, 1, 1), permute=(0 1 2))>
<Node[BUFFER] ALLOCATE(NID1285) : SID1284 <- () where :nrank=0 :dtype=INT32>
<Node[BUFFER] LOAD(NID1287) : LID1286 <- (SID1284) where :value=C>
<Node[BUFFER] ALLOCATE(NID1267) : SID1266 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID1269) : LID1268 <- (SID1266) where :value=C>
<Node[BUFFER] ALLOCATE(NID1263) : SID1262 <- () where :nrank=0 :dtype=UINT32>
<Node[BUFFER] LOAD(NID1265) : LID1264 <- (SID1262) where :value=A>
<ALLOCATE : TID1259 <- (shape=(LID1264, LID1268, 1), stride=(LID1286, 1, 1)) where :nrank=3 :dtype=FLOAT32 :_read_views=NIL :_output_type=NIL>
<Node[BINARYOPS] MOVE(NID1973) : BID1972 <- (TID1259, VID1066)>
<VIEW : STC331 <- (BID1972, shape=(LID1977, LID1981), views=((0, LID1977, 1, NIL), (0, LID1981, 1, NIL)), stride=(LID1989, 1), permute=(0 1))>
<Node[SPECIAL/VM] PAUSE/BACKWARD(NID6667) : STC331_1 <- (STC331)>
}
:NAME :MAIN332
:FW-OUTPUTS (STC331_1)
:BW-OUTPUTS NIL
:ID2TENSOR #<HASH-TABLE :TEST EQL :COUNT 1 {1001A62A93}>
:TAPE-LENGTH 92
:PC 0
:VARIABLES #<HASH-TABLE :TEST EQL :COUNT 2 {10046CBD63}>
:DUMPED NIL)
CL-USER>
As we discussed, sbcl 2.4.0 generates 17 ops for the following snippet:
while the number of ops generated are many more using sbcl 2.4.10
list of ops created