In parameters, there's max new tokens, so you can reliably predict how long the message is going to be.
Reading while it's generating is hard because of the constant scrolling.
Other UIs often solve this by scrolling down such that the start of the new message is at the top of the screen, and it doesn't need to scroll as the message is generating until it reaches the bottom of the window. This approach means you can't read the history without scrolling back up. I imagine you've seen this approach and rejected it.
How about a compromise where the message is assumed to be the length of the max new tokens as soon as you hit enter, and then if it ends up being less than that, the message box can shrink.
In parameters, there's max new tokens, so you can reliably predict how long the message is going to be. Reading while it's generating is hard because of the constant scrolling. Other UIs often solve this by scrolling down such that the start of the new message is at the top of the screen, and it doesn't need to scroll as the message is generating until it reaches the bottom of the window. This approach means you can't read the history without scrolling back up. I imagine you've seen this approach and rejected it.
How about a compromise where the message is assumed to be the length of the max new tokens as soon as you hit enter, and then if it ends up being less than that, the message box can shrink.