Closed kir-gadjello closed 6 months ago
Why a personal command-line tool needs this?
It is useful when you use it with local models to find the best configuration
These stats should not be botained from aichat, but should be obtained from the tools that directly run the model (such as ollama, localai). For example, for many models, what they return through streaming api is not a token, but a large segment. Its statistics are distorted.
Please add basic performance stats: prompt processing tokens/s, generation tokens/s behind a key like -vs Also a mode for debugging LLM API requests (log as json) would be useful behind a key like -va