Factors token accounting and decoding out of postprocessing/1/model.py into predict.py so that we can use the ensemble endpoint instead of BLS. This is the fix for the cog-triton performance regression.
Adds a envar flag to predict.py, LOG_PERFORMANCE_METRICS. If True, then performance metrics will be logged from predict.py.
Updates scripts/perf_test.py so that server-side metrics are reported if available (e.g. if LOG_PERFORMANCE_METRICS==True
This PR:
LOG_PERFORMANCE_METRICS
. If True, then performance metrics will be logged from predict.py.LOG_PERFORMANCE_METRICS==True