What does this PR do?

This first modifies the TGI continuous batching implementation to take advantage of transformers-neuronx implementation.

Instead of dropping the KV cache when adding new requests and rebuilding it from cached texts, we simply omit the pending requests when calling model.forward, specifying only the indices of the new requests to prefill.

A llama TGI unit test is specifically added to verify the results are still correct after that change (for Llama and Mistral, transformers-neuronx continuous batching is always on).

For Sagemaker deployment, some disk usage logs are added when fetching/exporting a model.
During export, the model generation config is fetched to provide default values.

huggingface / optimum-neuron

TGI: optimize continuous batching and improve export #506

What does this PR do?