replicate / cog-llama-template

LLaMA Cog template
Apache License 2.0
307 stars 52 forks source link

delay prints with a decorator #66

Closed technillogue closed 1 year ago

technillogue commented 1 year ago

cog throttles output to send webhooks at most once every 50ms, and prints count as output, so we don't want to print before yielding the first token. this is a relatively unobstructive way to do that that ensures we can still see any debugging information if there are errors

nickstenning commented 1 year ago

How about a compromise: a context manager that yields a callable that accumulates debugging information (including timestamps, perhaps?) and emits them all on exit.

I like the basic idea, but overwriting builtins.print is rather too clever for my liking.

technillogue commented 1 year ago

maybe this is is okay and still works with prints inside the inference engine etc?

technillogue commented 1 year ago

I think I'm going to go ahead with this and we can revisit this later