Joe/yorickvp/ci/joe/lang 220 identify cog triton tps bottleneck

This PR:

Factors token accounting and decoding out of postprocessing/1/model.py into predict.py so that we can use the ensemble endpoint instead of BLS. This is the fix for the cog-triton performance regression.
Adds a envar flag to predict.py, LOG_PERFORMANCE_METRICS. If True, then performance metrics will be logged from predict.py.
Updates scripts/perf_test.py so that server-side metrics are reported if available (e.g. if LOG_PERFORMANCE_METRICS==True

replicate / cog-triton