hidet-org / hidet

An open-source efficient deep learning framework/compiler, written in python.
https://hidet.org
Apache License 2.0
648 stars 52 forks source link

[Runtime] Add a new compiled format CompiledApp #408

Closed yaoyaoding closed 9 months ago

yaoyaoding commented 9 months ago

Motivation In LLM serving, there are both prefill and decode stages which have different computation graphs. We need to save this two graphs and load them together, and hope to share the weights of the two graphs.

CompiledApp hidet.runtime.CompiledApp is such a runtime object that may contain multiple compiled graphs and will deal with the weight sharing.

Usage

import pytest
import hidet
from hidet.testing.models import resnet18
from hidet.runtime import CompiledApp, save_compiled_app, load_compiled_app

module_1 = resnet18().cuda()
module_2 = resnet18().cuda()

x1 = hidet.symbol(['batch_size', 3, 224, 224], dtype='float32', device='cuda:0')
x2 = hidet.symbol([1, 3, 224, 224], dtype='float32', device='cuda:0')

y1 = module_1(x1)
y2 = module_2(x2)

# the two compiled graphs share the weights
cgraph_1 = hidet.trace_from(y1, inputs=[x1]).build()
cgraph_2 = hidet.trace_from(y2, inputs=[x2]).build()

# we create a compiled app with two compiled graphs
app = create_compiled_app(graphs={'graph_1': cgraph_1, 'graph_2': cgraph_2}, name='demo_app')

save_compiled_app(app, 'app.hidet')

app = load_compiled_app('app.hidet')

x = hidet.randn([1, 3, 224, 224], device='cuda')
y1 = app.graphs['graph_1'](x)
y2 = app.graphs['graph_2'](x)
hidet.utils.assert_close(y1, y2)

# check if they share the weights
# this is one important feature of compiled app that share the weights of graphs if they are numerically identical
assert len(set(app.graphs['graph_1'].weights) ^ set(app.graphs['graph_2'].weights)) == 0