mosaicml / examples

Fast and flexible reference benchmarks
Apache License 2.0
441 stars 125 forks source link

Update inference models to work with batching and go server #341

Closed RR4787 closed 1 year ago

RR4787 commented 1 year ago

updates yamls and model handlers to handle go server/batching. Leaving diffusion since its broken and needs to be fixed. Change attn_cnfg['attn_impl'] in mpt7b handler from 'triton' to 'torch' for the time being while dependency issues are sorted out in the go server

Manually tested deploying the models with go server and using a script to test batching requests.

dakinggg commented 1 year ago

Could you please include some evidence that these work in the PR description? Ideally we would have tests that at least try to start the server in CI, but understand if not top priority right now, and would like some manual tests in the PR description at least.