Open wesselvdv opened 1 year ago
@kthui Do you know what could be the issue here?
Is there some more debug I can turn on to expedite this? This is the first version where I am able to see more than it's just hanging.
Hi @wesselvdv, could you give the following a try to help us narrow down the cause of the issue?
s3://https://<snip>:443/p-triton/wav2vec/config.pbtxt/
on minio is actually a file but not a directory?--model-control-mode=explicit
when starting the server, and then load a small model (small in file size) using the load API from the client, and see if the hang is still replicable? It is possible the model(s) just takes a long time to be transmitted over the network.Hi @kthui I've tried your second suggestion and that does seem to work! That didn't work in a previous version (it would hang indefinitely), but it does now.
Glad it works! I'm closing this issue but please let us know if you would like follow-up and we will reopen the ticket.
Seems to be broken again unfortunately, not sure what caused it to hang again. We didn't change anything in our setup.
Are there any other steps we can take (e.g. enable more debug)?
@wesselvdv Running Trion with logging enabled should provide more context: tritonserver ... ... --log-verbose=1
. You can also try to build with debug symbols if you'd like to run with gdb.
@wesselvdv Running Trion with logging enabled should provide more context:
tritonserver ... ... --log-verbose=1
. You can also try to build with debug symbols if you'd like to run with gdb.
I had already put the debug on 2, assuming that info also shows warning (1). I'll have a look into gdb, not sure if I can do a remote session with that.
I think it could be the underlying S3 client (or minio server) is misreporting a file as a directory to Triton, which caused the infinity loop. I have already filed a ticket for us to investigate further.
@kthui Let me know if there's something I can do to help!
Description Having a minio model repository is causing an endless flurry of the following messages in the log:
Triton Information We're using the latest container version. (
23.02
)To Reproduce Try and use a minio bucket with
control_mode=none
, and it'll be stuck on startup on the same file endlessly.Expected behavior Should startup normally, and load all the models successfully. The model configuration is correct, as I've worked around this issue in previous versions by downloading all the configuration locally on startup, and pointing triton to the local directory.