Use token count for speed calc

petals-infra / chat.petals.dev

💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client

https://chat.petals.dev

308 stars 79 forks source link

Use token count for speed calc #34

Closed Webifi closed 1 year ago

Webifi commented 1 year ago

Added token_count of delta for each respose, since if max_new_tokens > 1 steps would be be more difficult to use for speed calculation.

Fixes #33.

Webifi commented 1 year ago

(Something went very strange when I tried to pull the merge from your remote branch, hence all the repeated changes.)

borzunov commented 1 year ago

@Webifi Great job! I'm gonna test it just in case and merge it soon.

Webifi commented 1 year ago

@borzunov Anything terribly wrong with this PR?

borzunov commented 1 year ago

@Webifi Sorry for the slow response, I'm on vacation right now :)

Are you okay with removing steps from the response? I feel like this field is a bit confusing, since people may think that it's the overall number of steps from the beginning of the session, etc.

Other than that, I think it's good to go!

Webifi commented 1 year ago

Yeah, I agree. I questioned if I should have even put steps in.

I'm having issues pushing changes to my fork again. I'll try to get this sorted out when I can find some time.

Webifi commented 1 year ago

My chat.petals.dev colab isn't working any longer for some reason, even without these changes applied, so I can't fully test the latest change I just pushed until I'm back in the office in a few weeks.

borzunov commented 1 year ago

@Webifi Seems to work, merging it. Thanks for your contribution!