jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning
Apache License 2.0
682 stars 89 forks source link

end-to-end inference time #78

Closed lilpenguin closed 3 years ago

lilpenguin commented 3 years ago

Hi! I noticed that the output of the examples under taso/examples folder provided cost of the graph, but cost is not exactly the end-to-end runtime.

I tried to output the end-to-end runtime of the resnet50 model before and after optimization by looking at the code in taso/examples/eval_joint.py and taso/examples/resnet50.py, and this is what I got:

import taso as ts
import onnx

def resnet_block(graph, input, strides, out_channels):
    w1 = graph.new_weight(dims=(out_channels,input.dim(1),1,1))
    t = graph.conv2d(input=input, weight=w1,
                     strides=(1,1), padding="SAME",
                     activation="RELU")
    w2 = graph.new_weight(dims=(out_channels,t.dim(1),3,3))
    t = graph.conv2d(input=t, weight=w2,
                     strides=strides, padding="SAME",
                     activation="RELU")
    w3 = graph.new_weight(dims=(4*out_channels,t.dim(1),1,1))
    t = graph.conv2d(input=t, weight=w3,
                     strides=(1,1), padding="SAME")
    if (strides[0]>1) or (input.dim(1) != out_channels*4):
        w4 = graph.new_weight(dims=(out_channels*4,input.dim(1),1,1))
        input=graph.conv2d(input=input, weight=w4,
                           strides=strides, padding="SAME",
                           activation="RELU")
    return graph.relu(graph.add(input, t))

graph = ts.new_graph()
input = graph.new_input(dims=(1,3,224,224)) #1,64,56,56))
t = input
for i in range(3):
    t = resnet_block(graph, t, (1,1), 64)
strides = (2,2)
for i in range(4):
    t = resnet_block(graph, t, strides, 128)
    strides = (1,1)
strides = (2,2)
for i in range(6):
    t = resnet_block(graph, t, strides, 256)
    strides = (1,1)
strides = (2,2)
for i in range(3):
    t = resnet_block(graph, t, strides, 512)
    strides = (1,1)

orig_onnx_model = ts.export_onnx(graph)
onnx.save(orig_onnx_model, "resnet50_orig.onnx")
print("Original graph runtime:")
print("TASO: end-to-end inference time = {}ms".format(graph.run_time()))

new_graph = ts.optimize(graph, alpha=1.0, budget=1000)
onnx_model = ts.export_onnx(new_graph)
onnx.save(onnx_model, "resnet50_optimized.onnx")
print("Optimized graph runtime:")
print("TASO: end-to-end inference time = {}ms".format(new_graph.run_time()))

However, when I ran the above code, I encountered errors when getting the runtime of the optimized model:

$ python resnet50_save_model.py
Original graph runtime:
TASO: end-to-end inference time = 260.49395751953125ms
        cost[Conv2D]: i(1 3 224 224) w(64 3 1 1) s(1 1) p(0) cost(0.3917) total_cost(0.3917)
        cost[Conv2D]: i(1 64 224 224) w(64 64 3 3) s(1 1) p(0) cost(5.1495) total_cost(5.5412)
        cost[Conv2D]: i(1 64 224 224) w(256 64 1 1) s(1 1) p(0) cost(2.1549) total_cost(7.6961)
        cost[Conv2D]: i(1 3 224 224) w(256 3 1 1) s(1 1) p(0) cost(1.0564) total_cost(8.7525)
        cost[Element]: cost(1.1642) total_cost(9.9167)
        cost[Activation]: mode(8) cost(0.6028) total_cost(10.5196)
        cost[Conv2D]: i(1 256 224 224) w(64 256 1 1) s(1 1) p(0) cost(2.5142) total_cost(13.0338)
        cost[Conv2D]: i(1 64 224 224) w(64 64 3 3) s(1 1) p(0) cost(5.1495) total_cost(18.1833)
        cost[Conv2D]: i(1 64 224 224) w(256 64 1 1) s(1 1) p(0) cost(2.1549) total_cost(20.3381)
        cost[Element]: cost(1.1642) total_cost(21.5023)
        cost[Activation]: mode(8) cost(0.6028) total_cost(22.1052)
        cost[Conv2D]: i(1 256 224 224) w(64 256 1 1) s(1 1) p(0) cost(2.5142) total_cost(24.6194)
        cost[Conv2D]: i(1 64 224 224) w(64 64 3 3) s(1 1) p(0) cost(5.1495) total_cost(29.7689)
        cost[Conv2D]: i(1 64 224 224) w(256 64 1 1) s(1 1) p(0) cost(2.1549) total_cost(31.9238)
        cost[Element]: cost(1.1642) total_cost(33.0880)
        cost[Activation]: mode(8) cost(0.6028) total_cost(33.6908)
        cost[Conv2D]: i(1 256 224 224) w(128 256 1 1) s(1 1) p(0) cost(2.6389) total_cost(36.3297)
        cost[Conv2D]: i(1 128 224 224) w(128 128 3 3) s(2 2) p(0) cost(2.7987) total_cost(39.1284)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(40.6130)
        cost[Conv2D]: i(1 256 224 224) w(512 256 1 1) s(2 2) p(0) cost(2.6563) total_cost(43.2693)
        cost[Element]: cost(0.4516) total_cost(43.7209)
        cost[Activation]: mode(8) cost(0.3034) total_cost(44.0243)
        cost[Conv2D]: i(1 512 112 112) w(128 512 1 1) s(1 1) p(0) cost(1.3306) total_cost(45.3549)
        cost[Conv2D]: i(1 128 112 112) w(128 128 3 3) s(1 1) p(0) cost(2.8160) total_cost(48.1710)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(49.6555)
        cost[Element]: cost(0.4516) total_cost(50.1071)
        cost[Activation]: mode(8) cost(0.3034) total_cost(50.4106)
        cost[Conv2D]: i(1 512 112 112) w(128 512 1 1) s(1 1) p(0) cost(1.3306) total_cost(51.7412)
        cost[Conv2D]: i(1 128 112 112) w(128 128 3 3) s(1 1) p(0) cost(2.8160) total_cost(54.5572)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(56.0418)
        cost[Element]: cost(0.4516) total_cost(56.4934)
        cost[Activation]: mode(8) cost(0.3034) total_cost(56.7968)
        cost[Conv2D]: i(1 512 112 112) w(128 512 1 1) s(1 1) p(0) cost(1.3306) total_cost(58.1274)
        cost[Conv2D]: i(1 128 112 112) w(128 128 3 3) s(1 1) p(0) cost(2.8160) total_cost(60.9434)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(62.4280)
        cost[Element]: cost(0.4516) total_cost(62.8796)
        cost[Activation]: mode(8) cost(0.3034) total_cost(63.1830)
        cost[Conv2D]: i(1 512 112 112) w(256 512 1 1) s(1 1) p(0) cost(2.4345) total_cost(65.6175)
        cost[Conv2D]: i(1 256 112 112) w(256 256 3 3) s(2 2) p(0) cost(3.2468) total_cost(68.8643)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(70.0555)
        cost[Conv2D]: i(1 512 112 112) w(1024 512 1 1) s(2 2) p(0) cost(2.4321) total_cost(72.4876)
        cost[Element]: cost(0.2826) total_cost(72.7702)
        cost[Activation]: mode(8) cost(0.1550) total_cost(72.9252)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(74.3443)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(77.5301)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(78.7212)
        cost[Element]: cost(0.2826) total_cost(79.0038)
        cost[Activation]: mode(8) cost(0.1550) total_cost(79.1588)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(80.5779)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(83.7637)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(84.9549)
        cost[Element]: cost(0.2826) total_cost(85.2375)
        cost[Activation]: mode(8) cost(0.1550) total_cost(85.3924)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(86.8115)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(89.9973)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(91.1885)
        cost[Element]: cost(0.2826) total_cost(91.4711)
        cost[Activation]: mode(8) cost(0.1550) total_cost(91.6261)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(93.0452)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(96.2309)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(97.4221)
        cost[Element]: cost(0.2826) total_cost(97.7047)
        cost[Activation]: mode(8) cost(0.1550) total_cost(97.8597)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(99.2788)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(102.4646)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(103.6558)
        cost[Element]: cost(0.2826) total_cost(103.9383)
        cost[Activation]: mode(8) cost(0.1550) total_cost(104.0933)
        cost[Conv2D]: i(1 1024 56 56) w(512 1024 1 1) s(1 1) p(0) cost(2.5360) total_cost(106.6294)
        cost[Conv2D]: i(1 512 56 56) w(512 512 3 3) s(2 2) p(0) cost(61.8636) total_cost(168.4930)
        cost[Conv2D]: i(1 512 28 28) w(2048 512 1 1) s(1 1) p(0) cost(1.0834) total_cost(169.5765)
        cost[Conv2D]: i(1 1024 56 56) w(2048 1024 1 1) s(2 2) p(0) cost(2.5359) total_cost(172.1123)
        cost[Element]: cost(0.1172) total_cost(172.2295)
        cost[Activation]: mode(8) cost(0.0813) total_cost(172.3108)
        cost[Conv2D]: i(1 2048 28 28) w(512 2048 1 1) s(1 1) p(0) cost(1.9024) total_cost(174.2132)
        cost[Conv2D]: i(1 512 28 28) w(512 512 3 3) s(1 1) p(0) cost(40.6487) total_cost(214.8619)
        cost[Conv2D]: i(1 512 28 28) w(2048 512 1 1) s(1 1) p(0) cost(1.0834) total_cost(215.9454)
        cost[Element]: cost(0.1172) total_cost(216.0625)
        cost[Activation]: mode(8) cost(0.0813) total_cost(216.1438)
        cost[Conv2D]: i(1 2048 28 28) w(512 2048 1 1) s(1 1) p(0) cost(1.9024) total_cost(218.0463)
        cost[Conv2D]: i(1 512 28 28) w(512 512 3 3) s(1 1) p(0) cost(40.6487) total_cost(258.6950)
        cost[Conv2D]: i(1 512 28 28) w(2048 512 1 1) s(1 1) p(0) cost(1.0834) total_cost(259.7784)
        cost[Element]: cost(0.1172) total_cost(259.8956)
        cost[Activation]: mode(8) cost(0.0813) total_cost(259.9769)
        Cost metrics: exe_time(259.9769) flops(233.9119) memory_access(6358.4209) kernel_launches(84)

        ===== Start Cost-Based Backtracking Search =====
        [0] cost = 259.9769 bestCost = 259.9769 candidates.size() = 0
        [1] cost = 259.9477 bestCost = 259.9477 candidates.size() = 1
        [2] cost = 259.9608 bestCost = 259.9477 candidates.size() = 0
        ===== Finish Cost-Based Backtracking Search =====

        cost[Conv2D]: i(1 64 224 224) w(64 64 3 3) s(1 1) p(0) cost(5.1495) total_cost(5.1495)
        cost[Conv2D]: i(1 64 224 224) w(256 64 1 1) s(1 1) p(0) cost(2.1549) total_cost(7.3044)
        cost[Element]: cost(1.1642) total_cost(8.4686)
        cost[Activation]: mode(8) cost(0.6028) total_cost(9.0714)
        cost[Conv2D]: i(1 256 224 224) w(64 256 1 1) s(1 1) p(0) cost(2.5142) total_cost(11.5856)
        cost[Conv2D]: i(1 64 224 224) w(64 64 3 3) s(1 1) p(0) cost(5.1495) total_cost(16.7351)
        cost[Conv2D]: i(1 64 224 224) w(256 64 1 1) s(1 1) p(0) cost(2.1549) total_cost(18.8900)
        cost[Element]: cost(1.1642) total_cost(20.0542)
        cost[Activation]: mode(8) cost(0.6028) total_cost(20.6570)
        cost[Conv2D]: i(1 256 224 224) w(64 256 1 1) s(1 1) p(0) cost(2.5142) total_cost(23.1712)
        cost[Conv2D]: i(1 64 224 224) w(64 64 3 3) s(1 1) p(0) cost(5.1495) total_cost(28.3207)
        cost[Conv2D]: i(1 64 224 224) w(256 64 1 1) s(1 1) p(0) cost(2.1549) total_cost(30.4756)
        cost[Element]: cost(1.1642) total_cost(31.6398)
        cost[Activation]: mode(8) cost(0.6028) total_cost(32.2426)
        cost[Conv2D]: i(1 256 224 224) w(128 256 1 1) s(1 1) p(0) cost(2.6389) total_cost(34.8816)
        cost[Conv2D]: i(1 128 224 224) w(128 128 3 3) s(2 2) p(0) cost(2.7987) total_cost(37.6803)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(39.1649)
        cost[Conv2D]: i(1 256 224 224) w(512 256 1 1) s(2 2) p(0) cost(2.6563) total_cost(41.8212)
        cost[Element]: cost(0.4516) total_cost(42.2728)
        cost[Activation]: mode(8) cost(0.3034) total_cost(42.5762)
        cost[Conv2D]: i(1 512 112 112) w(128 512 1 1) s(1 1) p(0) cost(1.3306) total_cost(43.9068)
        cost[Conv2D]: i(1 128 112 112) w(128 128 3 3) s(1 1) p(0) cost(2.8160) total_cost(46.7228)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(48.2074)
        cost[Element]: cost(0.4516) total_cost(48.6590)
        cost[Activation]: mode(8) cost(0.3034) total_cost(48.9624)
        cost[Conv2D]: i(1 512 112 112) w(128 512 1 1) s(1 1) p(0) cost(1.3306) total_cost(50.2930)
        cost[Conv2D]: i(1 128 112 112) w(128 128 3 3) s(1 1) p(0) cost(2.8160) total_cost(53.1090)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(54.5936)
        cost[Element]: cost(0.4516) total_cost(55.0452)
        cost[Activation]: mode(8) cost(0.3034) total_cost(55.3487)
        cost[Conv2D]: i(1 512 112 112) w(128 512 1 1) s(1 1) p(0) cost(1.3306) total_cost(56.6792)
        cost[Conv2D]: i(1 128 112 112) w(128 128 3 3) s(1 1) p(0) cost(2.8160) total_cost(59.4953)
        cost[Conv2D]: i(1 128 112 112) w(512 128 1 1) s(1 1) p(0) cost(1.4846) total_cost(60.9799)
        cost[Element]: cost(0.4516) total_cost(61.4315)
        cost[Activation]: mode(8) cost(0.3034) total_cost(61.7349)
        cost[Conv2D]: i(1 512 112 112) w(256 512 1 1) s(1 1) p(0) cost(2.4345) total_cost(64.1694)
        cost[Conv2D]: i(1 256 112 112) w(256 256 3 3) s(2 2) p(0) cost(3.2468) total_cost(67.4162)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(68.6074)
        cost[Conv2D]: i(1 512 112 112) w(1024 512 1 1) s(2 2) p(0) cost(2.4321) total_cost(71.0395)
        cost[Element]: cost(0.2826) total_cost(71.3221)
        cost[Activation]: mode(8) cost(0.1550) total_cost(71.4770)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(72.8961)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(76.0819)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(77.2731)
        cost[Element]: cost(0.2826) total_cost(77.5557)
        cost[Activation]: mode(8) cost(0.1550) total_cost(77.7107)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(79.1298)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(82.3155)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(83.5067)
        cost[Element]: cost(0.2826) total_cost(83.7893)
        cost[Activation]: mode(8) cost(0.1550) total_cost(83.9443)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(85.3634)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(88.5492)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(89.7403)
        cost[Element]: cost(0.2826) total_cost(90.0229)
        cost[Activation]: mode(8) cost(0.1550) total_cost(90.1779)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(91.5970)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(94.7828)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(95.9740)
        cost[Element]: cost(0.2826) total_cost(96.2566)
        cost[Activation]: mode(8) cost(0.1550) total_cost(96.4116)
        cost[Conv2D]: i(1 1024 56 56) w(256 1024 1 1) s(1 1) p(0) cost(1.4191) total_cost(97.8307)
        cost[Conv2D]: i(1 256 56 56) w(256 256 3 3) s(1 1) p(0) cost(3.1858) total_cost(101.0164)
        cost[Conv2D]: i(1 256 56 56) w(1024 256 1 1) s(1 1) p(0) cost(1.1912) total_cost(102.2076)
        cost[Element]: cost(0.2826) total_cost(102.4902)
        cost[Activation]: mode(8) cost(0.1550) total_cost(102.6452)
        cost[Conv2D]: i(1 1024 56 56) w(512 1024 1 1) s(1 1) p(0) cost(2.5360) total_cost(105.1812)
        cost[Conv2D]: i(1 512 56 56) w(512 512 3 3) s(2 2) p(0) cost(61.8636) total_cost(167.0449)
        cost[Conv2D]: i(1 512 28 28) w(2048 512 1 1) s(1 1) p(0) cost(1.0834) total_cost(168.1283)
        cost[Conv2D]: i(1 1024 56 56) w(2048 1024 1 1) s(2 2) p(0) cost(2.5359) total_cost(170.6642)
        cost[Element]: cost(0.1172) total_cost(170.7813)
        cost[Activation]: mode(8) cost(0.0813) total_cost(170.8626)
        cost[Conv2D]: i(1 2048 28 28) w(512 2048 1 1) s(1 1) p(0) cost(1.9024) total_cost(172.7651)
        cost[Conv2D]: i(1 512 28 28) w(512 512 3 3) s(1 1) p(0) cost(40.6487) total_cost(213.4138)
        cost[Conv2D]: i(1 512 28 28) w(2048 512 1 1) s(1 1) p(0) cost(1.0834) total_cost(214.4972)
        cost[Element]: cost(0.1172) total_cost(214.6144)
        cost[Activation]: mode(8) cost(0.0813) total_cost(214.6957)
        cost[Conv2D]: i(1 2048 28 28) w(512 2048 1 1) s(1 1) p(0) cost(1.9024) total_cost(216.5981)
        cost[Conv2D]: i(1 512 28 28) w(512 512 3 3) s(1 1) p(0) cost(40.6487) total_cost(257.2468)
        cost[Conv2D]: i(1 512 28 28) w(2048 512 1 1) s(1 1) p(0) cost(1.0834) total_cost(258.3303)
        cost[Element]: cost(0.1172) total_cost(258.4474)
        cost[Activation]: mode(8) cost(0.0813) total_cost(258.5287)
        cost[Conv2D]: i(1 3 224 224) w(320 3 1 1) s(1 1) p(0) cost(1.4189) total_cost(259.9477)
        cost[Split]: numOutputs(2) cost(0.0000) total_cost(259.9477)
        Cost metrics: exe_time(259.9477) flops(233.9119) memory_access(6356.1240) kernel_launches(83)
Optimized graph runtime:
CUDNN failure: CUDNN_STATUS_EXECUTION_FAILED
/home1/qinyiluo/taso/src/cudnn/conv2d_kernel.cu:97
Aborting...

Do you know what might be the problem? Or is there a better way to get the end-to-end runtime? Thank you very much for your time!

lilpenguin commented 3 years ago

I switched to the provided docker container (the error happened in a conda environment that I built), and the cudnn error disappeared!