replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

Joe/lang 205 make triton configuration configurable during predict setup #17

Closed joehoover closed 5 months ago

joehoover commented 5 months ago

This PR:

Triton Configuration Makes triton setup configurable with a yaml config. Currently, wysiwyg. If you do not provide a config, then default setup will be used. If you do provide a config, either by placing config.yaml in the ./src/ or setting an env COG_TRITON_CONFIG=./src/myconfig.yaml, it will be ingested and used to generate triton run-time configs.

Note, there are no protections right now! We should add validation and protections eventually.

Random Seeds We now expose seed in the predict signature. If not specified, we sample a seed and use that. We also log the seed along with a note that it will not impact generation if greedy decoding is used.

Make pad and end ID configurable We had hardcoded to 2 for llama, now it can be configured by an env.