Closed oOraph closed 5 months ago
note: the associated generated image is available for testing here :) [docker.io]/raphael31415/neuronx-tgi:0.0.21.dev0
This looks good to me, but I am a bit worried some configurations might not work. Could you add integration tests under https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference/integration-tests ? I also need to add a github workflow to build the image and run the integration tests (make tgi_docker_test).
done, both -> tgi_implicit_env.py and workflow
Actually I remove the workflow, the integration test test_gpt2.py cannot work for the local_neuron variant, reason:
Some directory is filled with data here: https://github.com/huggingface/optimum-neuron/blob/6856557565c20c16311191409adf7968d41253ea/text-generation-inference/integration-tests/test_gpt2.py#L27
then this directory is expected to be shared with the docker container, here: https://github.com/huggingface/optimum-neuron/blob/6856557565c20c16311191409adf7968d41253ea/text-generation-inference/integration-tests/conftest.py#L115
the problem is that this cannot work if tests are run within a container + docker dind environment as the volume filled in the first container won't be available on the host and thus won't end up on the second (hence tgi will launch with an empty dir)
-> so either we remove the local_neuron variant or we find a way to share the volume between the container running pytests and the one spawned by pytests
I had to deactivate/remove all tests related to aws-neuron/gpt2-neuronx-bs4-seqlen1024 because of the neuronx-cc upgrade v2.13.xxx
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Side note: I bumped the version to 0.0.22.dev0. This will temporarily break integration tests (as there are no compatible cached models for the CI yet: gpt 2 compiled with neuronx-cc 2.13.66.0+6dfecc895 on 1 or 2 cores)
Need this to workaround the model static params, for the docker entrypoint to adapt tgi environment accordingly to the specified model This will make usage of the image easier: default params (e.g not specifying anything) should be enough for most models