triton-inference-server / fil_backend

FIL backend for the Triton Inference Server
Apache License 2.0
72 stars 36 forks source link

Tests failing on NVIDIA Tesla T4, AWS G4 instance #80

Closed hcho3 closed 3 years ago

hcho3 commented 3 years ago

Tests fail when using AWS G4 instance.

Steps to reproduce:

  1. Set up CUDA 11.3 (latest) and NVIDIA Docker on a fresh EC2 instance of type g4dn.8xlarge.
  2. Build the trition_fil Docker image: docker build -t triton_fil -f ops/Dockerfile .
  3. Run the CI script: LOCAL=1 ./qa/run_tests.sh

When I switched the instance to p3.2xlarge type (V100 GPU), the tests run successfully.

Error messages:

wphicks commented 3 years ago

Triage data thus far:

wphicks commented 3 years ago

Another note for triaging: Throughput on tests is quite high relative to V100 and RTX8000 (roughly 3 times). This may be revealing a race condition, and if so #77 may be related.