replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

Return logits when return_logits is set to true. #16

Closed manishravula closed 4 weeks ago

manishravula commented 5 months ago

This PR adds support for the predict method to return a torch tensor file containing the logits output for the prompt. This can be read by the next program to get logits.

manishravula commented 5 months ago

Example use:

% curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "prompt": "What is machine learning?", "return_logits": true } }' \ http://localhost:5000/predictions

I don't quite understand how the return values work, but the server isn't crashing and the file is being written. I think I have some yield bug somewhere in returning the filename correctly.