replicate / replicate-python

Python client for Replicate
https://replicate.com
Apache License 2.0
744 stars 212 forks source link

setup timeout without meaningful logs #298

Closed afpro closed 5 months ago

afpro commented 5 months ago

my model keep failing setup with 'model container failed to boot and complete setup within 600 seconds'. searched on google, no solution found. how could I find more information about what happened?

afpro commented 5 months ago

model version: dea48a520fc0954407bfb1dd9dd3d8d4eabdb675b2cd947d6aaf302485a714ce

mattt commented 5 months ago

Hi @afpro. It looks like your model was configured to run on a T4. If the model is indeed 13B (as the name implies), the 16GB VRAM available on that hardware may not be sufficient. That'd be my guess as to why it's failing during setup. Go to the model settings and try switching the hardware to an A40 or A100.

afpro commented 5 months ago

Hi @afpro. It looks like your model was configured to run on a T4. If the model is indeed 13B (as the name implies), the 16GB VRAM available on that hardware may not be sufficient. That'd be my guess as to why it's failing during setup. Go to the model settings and try switching the hardware to an A40 or A100.

I use llama2-chat-70b on A40 and got a 'out of memory' error, in this situation, i will got a python exception stack, not 'timeout'.

afpro commented 5 months ago

I just give up.