huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
196 stars 59 forks source link

Set up tgi environment values with the ones used to build the model #529

Closed oOraph closed 5 months ago

oOraph commented 6 months ago

Need this to workaround the model static params, for the docker entrypoint to adapt tgi environment accordingly to the specified model This will make usage of the image easier: default params (e.g not specifying anything) should be enough for most models

oOraph commented 6 months ago

note: the associated generated image is available for testing here :) [docker.io]/raphael31415/neuronx-tgi:0.0.21.dev0

oOraph commented 6 months ago

This looks good to me, but I am a bit worried some configurations might not work. Could you add integration tests under https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference/integration-tests ? I also need to add a github workflow to build the image and run the integration tests (make tgi_docker_test).

done, both -> tgi_implicit_env.py and workflow

oOraph commented 6 months ago

Actually I remove the workflow, the integration test test_gpt2.py cannot work for the local_neuron variant, reason:

Some directory is filled with data here: https://github.com/huggingface/optimum-neuron/blob/6856557565c20c16311191409adf7968d41253ea/text-generation-inference/integration-tests/test_gpt2.py#L27

then this directory is expected to be shared with the docker container, here: https://github.com/huggingface/optimum-neuron/blob/6856557565c20c16311191409adf7968d41253ea/text-generation-inference/integration-tests/conftest.py#L115

the problem is that this cannot work if tests are run within a container + docker dind environment as the volume filled in the first container won't be available on the host and thus won't end up on the second (hence tgi will launch with an empty dir)

-> so either we remove the local_neuron variant or we find a way to share the volume between the container running pytests and the one spawned by pytests

oOraph commented 5 months ago

I had to deactivate/remove all tests related to aws-neuron/gpt2-neuronx-bs4-seqlen1024 because of the neuronx-cc upgrade v2.13.xxx

HuggingFaceDocBuilderDev commented 5 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

oOraph commented 5 months ago

Side note: I bumped the version to 0.0.22.dev0. This will temporarily break integration tests (as there are no compatible cached models for the CI yet: gpt 2 compiled with neuronx-cc 2.13.66.0+6dfecc895 on 1 or 2 cores)