[MetaSchedule] [CUDA target] Did you forget to bind?

Civitasv commented 1 year ago

Currently, the parameters I am using is as follows:

def do_all_tune(mod, target):
    tunning_dir = "gpu3090"
    tunning_record = "database_tuning_record.json"
    tunning_workload = "database_workload.json"
    cooldown_interval = 150
    trial_cnt = 2000

    local_runner = ms.runner.LocalRunner(cooldown_sec=cooldown_interval, timeout_sec=10)
    database = ms.tir_integration.tune_tir(
        mod=mod,
        target=target,
        work_dir=tunning_dir,
        max_trials_global=trial_cnt,
        max_trials_per_task=2,
        runner=local_runner,
        special_space={},
    )
    if os.path.exists(tunning_record):
        os.remove(tunning_record)
    if os.path.exists(tunning_workload):
        os.remove(tunning_workload)
    database.dump_pruned(
        ms.database.JSONDatabase(
            path_workload=tunning_workload,
            path_tuning_record=tunning_record,
        )
    )

Could you kindly share the parameters you are using to generate the log? I'm curious to know.

Civitasv commented 1 year ago

After finishing tuning, I use:

with args.target, db, tvm.transform.PassContext(opt_level=3):
        mod_deploy = relax.transform.MetaScheduleApplyDatabase(enable_warning=True)(mod)

It will show many warnings like:

[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: matmul23
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_conv2d14_add24_add25
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: take
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_conv2d37_add34_add35_divide7
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_conv2d24_add10
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_conv2d7_add10
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_matmul28_add27_add28
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_matmul11_add11_strided_slice4
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_conv2d4_add10_add12
[17:27:57] /home/wyc/husen/sandbox/tvm/src/relax/transform/meta_schedule.cc:162: Warning: Tuning record is not found for primfunc: fused_matmul9_add8_gelu

Then I use relax.build, it will show the typical Did you forget to bind? error.

I don't know why this happen.

cc @tqchen @MasterJH5574

nineis7 commented 1 year ago

You can use diffusers==0.15.0 version and problems may all be solved. ^^

Civitasv commented 1 year ago

You can use diffusers==0.15.0 version and problems may all be solved. ^^

Thanks for your help! But I've tried this, still not work.

MasterJH5574 commented 1 year ago

Hi @Civitasv, thanks for the question! We used meta_schedule.relax_integration.tune_relax to tune the IRModule mod_deploy.

I guess the mismatch you observed is because both the TIR extraction of tune_relax and MetaScheduleApplyDatabase will “normalize” each TIR function, while tune_tir does not. You can try tune_relax and see if it works for your case.

MasterJH5574 commented 1 year ago

The use of tune_relax can be something like

ms.relax_integration.tune_relax(
    mod=mod_deploy,
    target=tvm.target.Target("apple/m1-gpu-restricted"),  # for WebGPU 256-thread limitation
    params={},
    builder=ms.builder.LocalBuilder(
        max_workers=os.cpu_count(),
    ),
    runner=ms.runner.LocalRunner(timeout_sec=60),
    work_dir="log_db",
    max_trials_global=50000,
    max_trials_per_task=2000,
)

Civitasv commented 1 year ago

Thanks for your reply. I've tried this, but sadly, it still cannot work. Following your advice, I've changed my configuration as follows:

def do_all_tune(mod, target):
    tunning_dir = "gpu3090_workdir"
    tunning_record = "gpu3090/database_tuning_record.json"
    tunning_workload = "gpu3090/database_workload.json"
    cooldown_interval = 0
    trial_cnt = 100
    trial_per = 2

    local_runner = ms.runner.LocalRunner(cooldown_sec=cooldown_interval, timeout_sec=60)
    database = ms.relax_integration.tune_relax(
        mod=mod,
        target=target,
        work_dir=tunning_dir,
        max_trials_global=trial_cnt,
        max_trials_per_task=trial_per,
        runner=local_runner,
        params={},
    )
    if os.path.exists(tunning_record):
        os.remove(tunning_record)
    if os.path.exists(tunning_workload):
        os.remove(tunning_workload)
    database.dump_pruned(
        ms.database.JSONDatabase(
            path_workload=tunning_workload,
            path_tuning_record=tunning_record,
        )
    )

Still saying https://github.com/mlc-ai/web-stable-diffusion/issues/43#issuecomment-1625207240.

I wonder if it is relavent to the max_trials_global and max_trials_per_task option.

Civitasv commented 1 year ago

I wonder if it is relevent to the max_trials_global and max_trials_per_task option.

Yes, it is relevant. For 10000 and 2000 for trial_cnt and trial_per, only the take operator is wrong.

MasterJH5574 commented 1 year ago

Thanks @Civitasv! Glad that it works :-)

mlc-ai / web-stable-diffusion

[MetaSchedule] [CUDA target] Did you forget to bind? #43