Open mdr223 opened 1 year ago
One edit here is that Ludo raised an issue (#115) that the DumbLLM is no longer compatible. We need to resolve this issue before we get to step 5 with a PR
Another final addition we will likely want to make is merging in my work on the db backend once it's ready. Without that we will also need to lock the conversations.json
file (which is plausible), but could harm performance.
I believe we also discussed introducing timeouts in the requests in the app. Not sure if we want to open an issue about that.
With the new PR for request timeouts (#141), I believe we can close this issue once the PR is merged.
Summarizing the steps to begin hardening our system for next week. Please add anything I may have missed from our discussion.
main
N
API calls to our Flask app, measure and plot histogram of latencies forN = [10, 100, 1000, 10000]
t3desk19.mit.edu:7683
~5 times using the PSET 7 question to get some sense of the distribution for the latencyN
calls to Flask app by deployingDumbLLM
indev
but on t3desk19 withtime.sleep(np.random.normal(mean, std))
wheremean
andstd
are guesstimates of latency parameters based on calls tot3desk19.mit.edu
t3desk
is important b/c parallelism will be different compared withsubmit06
And separately but also importantly:
raise e
from inside our maintry-except
block inChatWrapper