Closed owenvallis closed 1 year ago
Removing all tf.convert_to_tensor() calls before predict. While the previous change prevented the memory leak in the case where we called multiple models in a loop, it ended up restricting calls to predict to a single tensor batch. This is too restrictive and prevents us from calling multi-headed models.
this change was required to prevent a slowdown and possible memory leak when passing lists of inputs instead of np.array or tensors. However, this breaks passing multiple inputs.
We should add a type check first and handle the multi input case properly.