triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

Using GEMM files in fastertransformer_backend. #49

Closed SnoozingSimian closed 1 year ago

SnoozingSimian commented 1 year ago

Discussed in https://github.com/triton-inference-server/fastertransformer_backend/discussions/48

Originally posted by **SnoozingSimian** September 22, 2022 While loading both GPTJ and GPT-NeoX models, I get the message `[WARNING] gemm_config.in is not found; using default GEMM algo` This suggests to me that there is a way to add gemm algos while loading these models in, I have generated the `gemm_config.in` for GPT-NeoX using the FasterTransformer binaries, but I don't know where to place this file so that it can be found by the backend. Is there any possible way to use it currently?
byshiue commented 1 year ago

This document https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gptneox_guide.md demonstrates how to build FT source codes and run the gemm test.

minsuuuuuuuub commented 1 year ago

Hi, I got the answer from bert- guide docs, because I have same issue on BERT models.

"If you want to use the library in other directory, please generate this file according to your setting and copy it to your working directory."

When I move the gemm_config.in file to working directory, where I start the triton server, the message was not appeard.

SnoozingSimian commented 1 year ago

Thanks @minsuuuuuuuub I also reached the same answer.

So in order to use the gemm_config.in we need to place it in the directory where the triton server is being run from, it just does a local lookup.

Weird how this is not mentioned anywhere in the GPT docs.